BRIEF CONTENTS 


PART ONE: Principles of Test Construction 


1. 


2 
3 
4 
ee 
6 
7 
8 


Introduction to Measurement 3 
Test Construction 21 

Item Writing 36 

Item Analysis 57 

Reliability 79 

Validity 104 

Norms and Test Scales 125 
Response Set in Test Scores 140 


PART TWO: Principles of Measurement 


2, 
10. 
Ths 
iz, 
13. 


Measurement of Intelligence, Aptitude and Achievement 149 


Measurement of Personality 186 


Projective Techniques 219 
Techniques of Observation and Data Collection 267 


Scaling Techniques 315 


PART THREE: Principles of Research Methodology 


14. 
15. 
16. 
17. 
18. 
19. 
20. 
21. 
22. 
23. 
24. 


Sampling 365 

Social Scientific Research 392 

Single-Subject Experimental Research and Small N Research 433 
Historical Research 446 

The Problem and The Hypothesis 451 

Reviewing the Literature 466 

Variables 474 

Research Design 491 

Qualitative Research 555 

Carrying Out Statistical Analyses 593 

Writing a Research Report and a Research Proposal 669 


Objective Questions 683 
Appendices and References 711 
Glossary 753 


Subject Index 777 


(viii) 


CONTENTS 


pART ONE: Principles of Test Construction 


1. INTRODUCTION TO MEASUREMENT 3 
Measurement and Evaluation 3 
The History of Psychological Measurement and M 
Levels of Measurement (or Measurement Sc le ental Testing 4 
Properties of Scales of Measurement 13 ve 
Functions of Measurement 13 
Distinction between Psychological Measurement and Phycic. 
Steps in Measurement Process 16 oS Measurement 15 
Problems Related to the Measurement P 
General Problems of Measurement 18 
Sources of Errors in Measurement 19 
Difference among Assessment. Testin 
Review Questions 20 


rocess 17 


8 and Measurement 20 


2. TEST CONSTRUCTION 21 
Meaning of Test in Psychology and Education 21 
Classification of Tests 23 
Characteristics of a Good Test 25 
General Steps of Test Construction 26 
Uses and limitations of Psychological Tests and Testings 29 
Ethical Issues in Psychological Testing 31 
Steps in Selecting Appropriate Published Tests 33 
Review Questions 35 


3. ITEM WRITING 436 
Meaning and Types of Items 36 
Difference Between Essay. Type Tests and Obyective-Type Tests 51 
General Guidelines tor Item Writing 54 
General Methods of Scoring Obyective-Test Items 55 
Review Questions 56 


4, ITEM ANALYSIS 57 


Meaning and Purpose of Item Analysis 57 
Power Tests 58 

Item Difficulty 58 

Optimal Difficulty Value for a Reliable Test 63 
Index of Discrimination 63 

A Simplified Item Analysis Form 67 
Effectiveness of Distractors or Foils | 
Speed Tests 70 | 4 
Factors Influencing the index of Difficulty an 
the Index of Discrimination 72 
Problems of Item Analysis 72 


Distractor Analysis) 69 


(ix) 





Important Interactions Among Item characteristics 74 
The Item Characteristic Curve (ICC) and Item Response theory 75 
Review Questions 78 
5. RELIABILITY 79 


History and Theory of Reliability 79 

Meaning of Reliability 82 

Methods (or Types) of Reliability 85 

What is a Satisfactory Size for the Reliability Coefficient? 95 
Standard Error of Measurement 96 

Reliability of Speed Test 97 


Factors Influencing Reliability of Test Scores 97 


How to Improve Reliability of Test Scores? 100 
Fstimation of True Scores 101 


Index of Reliability 102 
Reliability of Difference Score 102 
Reliability of Composite Score 103 
Review Questions 103 


6. VALIDITY 104 


Meaning of Validity 104 
Aspects of Validity 105 


Convergent Validation and Discriminant Validation 114 
Statistical Methods for Calculating Validity 117 

Factors Influencing Validity 118 

Concept of Cross-Validation 121 

Extravalidity Concerns 122 

Relation of Validity to Reliability 123 

Review Questions 124 


7, NORMS AND TEST SCALES 125 


Meaning of Norm-Referent ing and Criterion-Referencing 125 
Steps in Developing Norms 127 


Types of Norms and Test Scales 127 

Computer Applications In Psychological Testing and Assessment 136 
Criteria of Good Test Norms 138 

Review Questions 139 


8. RESPONSE SET IN TEST SCORES 140 


Meaning of Response Sets 140 
Types of Response Sets 141 
Implications of Response Sets 143 


Methods to Eliminate Response Sets 143 
Review Questions 146 


PART TWO: Principles of Measurement 


9. MEASUREMENT OF INTELLIGENCE, APTITUDE AND ACHIEVEMENT 149 
Different Viewpoints Towards Intelligence 149 
Types of Intelligence Tests 157 


(x) 


Types ol Intelligence Test Scores 175 

Distinction Between Aptitude Test and Achievement Te 
Types ol Aptitude and Achievement lests 178 Wolest 177 
Tests of 4 reativity 184 

Review Questions 185 


‘“ MEASUREMENT OF PERSONALITY 186 


Meaning and Purpose of Personality Measurement 186 

Tools of Personality Assessment 187 | 

Popular Strategies Involved in Construction of Personality Inventories 188 
clactlih WwieS Tt 


Overcoming Distortions in Self-report Inventories 206 
situational Tests 207 


Measurement of Interests, Values and Attitudes 209 


Review Questions 218 


PROJECTIVE TECHNIQUES 219 
Meaning and Types of Projective Techniques 219 
Classification of Projective Techniques 220 
Pictorial Techniques 224 


The Rorschach Test 224 
Interpretation of the Rorschach Protocol 241 


The Holtzman Inkblot Test 252 
Thematic Apperception Test (TAT) 254 
Verbal Techniques 260 

Expressive Techniques 261 

Evaluation of Projective Techniques 264 
Review Questions 266 


S OF OBSERVATION AND DATA COLLECTION 267 


11. 


12. TECHNIQUE 
Methods of Data Collection 268 
Questionnaire and Schedule 269 
Interview 275 
Content Analysis 280 
Observation as a Tool of Data Collection 282 


Difference Between Participant Observation and 
Nonparticipant Observation 285 

Rating Scale 285 

Types of Rating Scales 286 

Other Special Types of Rating Scales 293 

Problems in Obtaining Effective Ratings 305 

Methods of Improving Effectiveness of Rating Scales 307 
Errors in Ratings 308 

Evaluation of Rating Scales 310 

Review Questions 313 


13. SCALING TECHNIQUES 315 
Distinction Between Psychophysical and 
Psychological Scaling Methods 315 
Psychophysical Scaling Methods 316 
Weber's Law and Fechner's Law 333 
Steven's Power Law 335 


(xi) 





Newer Psychophysical Methods 336 
Psychological Scaling Methods 337 
Methods of Attitude Scales or Opinionnaires 352 
Review Questions 362 


PART THREE: Principles of Research Methodology 


14. SAMPLING 365 
Meaning and Types of Sampling 365 
Need for Sampling 367 
Fundamentals of Sampling 368 
Principles of Sampling 369 
Factors Influencing Decision to Sample 371 
How Large should a Sample Be? 372 
Methods of Drawing Random Samples 374 
Simple Random Sample 375 
Stratified Random Sample 378 
Area (or Cluster) Sampling 381 
Quota Sampling 382 
Purposive or Judgemental Sampling 383 
Accidental Sampling 384 
Snowball Sampling 384 
Saturation Sampling and Dense Sampling 385 
Double Sampling 386 
Mixed Sampling 386 
Requisites of a Good Sampling Method 387 
Common Advantages of Sampling Methods 388 
Sampling Distribution 388 
Sampling Error 389 
Review Questions 391 


15. SOCIAL SCIENTIFIC RESEARCH 392 
Meaning and Characteristics of Scientific Research 392 
Scientific Approach to the Study of Behaviour 394 
Validity in Research or Experimental Validity 397 | 
Controlling Threats to Reliability and Validity in Research 401 
Phases or Stages in Research 403 
Types of Educational Research 406 
Types of Research: Experimental and Nonexperimental 406 
Difference Between Research Method and Research Methodology 424 
Ethical Problems in Research 425 
Comparison Between Experimental and 
Nonexperimental Research 427 
Types of Experiment 428 
Types of Applied Research 429 
Review Questions 431 


16. SINGLE-SUBJECT EXPERIMENTAL RESEARCH AND SMALL N RESEARCH 433 


Meaning and Origin of Single-Subject Experimental Research 433 
General Procedures of Single-Subject Experimental Research 434 


Cxii) 


Basic Designs of Single-Subject Experimental Research 435 
Data-Collection Strategies in Single-Subject Experimental Research 439 
Evaluating Data in Single-Subject Experimental Research 440 

Strengths and Weaknesses of Single-Subject Experimental Research 44] 


aan Single-Subject Research and L; 
Small N Design: Nature and Historical p arge N Research 443 


, erspectives 
Review Questions 445 p 444 


17. HISTORICAL RESEARCH 446 


Meaning of Historical Research and 
Steps in Historical Research 447 
Sources of Historical Data 447 
Historical Criticism 448 
Limitations of Historical Research 449 
Review Questions 450 


its Necessity 446 


18. THE PROBLEM AND THE HYPOTHESIs 
Meaning and Characteristics 6 
Sources of Stating a Research Problem 453 
Important Considerations in Selecting a Research Problem 454 
Ways in Which a Problem is Manifested 455 
Types of Research Problems 456 
Importance of Formulating a Research Problem 457 
Steps in Formulating a Research Problem 457 
Meaning and Characteristics of a Good Hypothesis 458 
Formulating a Hypothesis 460 
Ways of Stating a Hypothesis 464 
Types of Hypotheses 461 
sources of Hypotheses 463 
Functions of Hypotheses 464 
Review Questions 465 


451 
f a Research Problem 451 


19. REVIEWING THE LITERATURE 466 
Purpose of the Review 466 
Types of Literature Review 467 
sources of the Review 467 
Types of Literature 469 
Writing Process of the Literature Review 470 
How Old should the Literature be? 471 


Preparation of Index Card for Reviewing and Abstracting 471 
Abstract 472 


Review Questions 473 
20. VARIABLES 474 


Meaning and Types of Variables 474 

Difference between a Variable and a Concept 481 

Methods of Measuring Dependent Variables 481 

Important Considerations in Selection of Variables 482 

Important Approaches to Manipulating Independent Variables 483 
Techniques of Controlling Extraneous Variables 484 

Controlling Demand Characteristics 487 

Review Questions 490 


(xiii) 


~~ 





21. RESEARCH DESIGN 491 
Meaning and Purpose of Research Design 492 
Criteria of Research Design 494 
Basic Principles of Experimental Design 495 
Basic Terms used in Experimental Design 496 
Some Important Types of Research Design 500 
Between-subjects Design 501 | 
Problem of Creating Equivalent Groups in Between-subjects Design 534 
Within-subjects Design 535 | 
Problem of Controlling Sequence Effects in Within-subjects Design 536 
Comparison of Between-subjects Design and 
Within-subjects Design 538 
Experimental Design based upon the Campbell and 
Stanley Classification 539 
Pre-Experimental Design (Nondesigns) 540 
True Experimental Design 541 
Quasi-Experimental Designs 543 
Ex Post Facto Design 550 
Steps in Experimentation 551 
Review Questions 554 


22. QUALITATIVE RESEARCH 555 


Meaning and Essential Features of Qualitative Research 555 
A Qualitative Research Model: Five Components 556 
Relevance of Qualitative Research 558 

Qualitative Research: A Brief Historical Introduction 558 
Themes of Qualitative Research 560 

Theoretical Perspectives of Qualitative Research 562 
Research Design Strategies of Qualitative Research 564 
Sampling Techniques of Qualitative Research 569 

Data Collection Techniques in Qualitative Research §72 
Data Analysis and Interpretation 578 

Comparison of Methods of Qualitative and Quantitative Data Analysis 587 
Combining Qualitative and Quantitative Approaches 589 
Review Questions 591 


23. CARRYING OUT STATISTICAL ANALYSES 593 
Sample and Population 593 
Normal Curve 594 
Measures of Relative Position: Standard Scores 600 
Parametric and Nonparametric Statistical Tests 601 
Parametric Statistics 604 
Nonparametric Statistics 641 
Correlation and Regression 659 
Major Terms and Issue in Correlation and Regression 660 
Choosing Appropriate Statistical Tests 664 
Review Questions 668 


24. WRITING A RESEARCH REPORT AND A RESEARCH PROPOSAL 669 
General Purpose of Writing a Research Report 669 


(xiv) 


Structure or Format of a Research Report ( 
Style of Writing a Research Report 674 
Typing the Research Report 676 
Evaluating a Research Report 676 
Preparing a Research Proposal 676 
Review Questions 679 


Style Manual) 669 


OBJECTIVE QUESTIONS 683 
APPENDICES AND REFERENCES 711 
GLOSSARY 753 

SUBJECT INDEX 777 


(xv) 





Part One 
PRINCIPLES OF 
TEST CONSTRUCTION 





(i) Nominal or Classifj 
(ii) Ordinal Measurem 
(ii) Interval or Equ Casurement 
(iv) Ratio Measurement 
Properties of Scales of Me 
Functions of Measure 
Distinction be 


ent 
al Interval Me 


4surement 
ment 
tween Psychologic 
steps in Measurement p 
Problems Related to Me 
General Problems of My 


al Measurement and Physic 


al Measurement 
rOCcess 


4surement Pr CESS 
-ASUTeEMent 

(i) Indirectness of Measurement 
(i) Ine mpleteness of Measurement 
(i) Relativity of Measuremeny 


(iV) Errors in Measurement 


@ sources of Errors IN Measurement 


e Difference Among Assessment, Testing 





and Measurement 





an 


Lr 


a 
MEASUREMENT AND EVALUATION 


In everyday life. the terns 


Measurement’ and 
However, in psychok 


mICAl, sociok mic al 
separately because they connote Tw 
assigning numerals to EVENTS 
MeASUTEMeNL as 
definition has be 


evaluation’ are often used interchangeably, 


es these two terms are used 
‘surement’ refers to the process of 
according to certain rules. Tyler (1963, 7) defines 
sccording to rules”. A still more elaborate and wider 
eH Biven by Nunnally (1979 ’): “Measurement Consists of rules for assigning 
Numbers to Objex 1. ee SUC a Way as lO represent quantities of attributes.” An analysis of the 


definition of measurement given by Nunnally reveals the lollowing Main properties 
of Measurement, 


and educ atic Nal research 


 ditlerent Meanings. ‘Me 
( te ls, el 


“ASSIPNIMENL of numerals 


l. In the Process Ol Measurement, numbers are assigned according to some rules. 
Number is a kind of numeral to which is assigned some quantitative meaning. In the mapas : 
Measurement the investigator assigns numbers, not of his own chp, “ prema disioesi 
lixed and explicit rules. Rules are the procedures to transiorm qualities o ic nae vila aie et 
‘Nunnally & Bernstein 1994: Rosenthal, Rasnow & Rubin ictal Fareed 2 when one 
ee Fre One type is where the ik h . i are very explicit 
IS Mercier; 7 nt es in teet and inches, | eee | ty oor | 
‘ane mae meaptieye measure the extroversion trait of personality or the 


3 








4 fens, Meaoirements and Resear th Methook in Tehavionenad Se fences 


| cas clearas inthe fin 

Intellivence ofa child. Obviously, in such asituation the rules would net he as cle oot if 
: ' i" lle 

example, For measuring psychological, sociological and educational attributes 

wenerally Vague and less caplet 


2, Measurement is always concerned with certain alinibutes or features ol saa 1% 
these attributes or features of the object which are measured and not the objec ia he 
example, one would measure the aptitude, intelligence, attitude, etc, ol pine He ms si 
person hirnsell, When an investigator is measuring the attribute of a Person, ul id 2 seegere 
difficulties. First, he may be asked to measure an attribute, the existence q i ren rene sigohi 
t xtra SeNSOrY PerCOptOn is One sul 1) example, Mies INVEST alors penn : sai eth ae 
perception in many individuals. In such a case, measurement I not 3 kits which are not 
impossible task. Second, the investigator may be asked liege oe “ ens when one is 
uritary, rather hey are a mocdure ol several cuibatiribules, Usually, this nePe os ol pment | 
asked to measure ‘personally adjustment, Adjustment may relate to ica sacle rovethel 
vlc, each of which requires a separate measurement and any attempt to me ane tai iris sete 
may create difficulty, In such a situation the investigator can, however, measure the ¢ ps 
with more sophistic ated instruments specially designed for the purpose. 


in other words, measurement involves the process of quantiication. — : apwisi ta 
us | articular attribute is present ina particular object. For example, 
how much orto what extent that particula | Aer De eee eb | 
: -¢ measuring the achievernent of a child in arithmetic, he quantifies it by — 
when the investigator 16 measuring yet <shaniine. Gdlicaine: beaiaeinetn ae 
saying that the child has 80% marks in his class, The percentap ate 
arithmetical knowledge he has gained in the class, . 
Measurement is different from its so-called synonym ‘evaluation’. By evaluation, ee Cae 
appraisal or assessment with respect to some standard. Tuckman (1975, 12) detines evaluation as | 
“a process wherein the parts, processes or outcomes of a programme are examined is see | 
whether they are satisfactory, particularly with reference to the programme’s stated objectives, 
our own expectations or our own standards of excellence.” Thus evaluation involves a process of | 
appraisal of an object or event with reference to some standard, The standard may be social, 
cultural or scientific, The standard may also be true or arbitrary. An investigator may measure the 
height of a 6-year-old child (which is, say, 30") and type him as short. Atypist typing 80 words per 
minute may be described as a ‘Grade A’ typist. Description of the height of the child (which is 
30") and the typing speed of the typist (which is BO words per minute), are examples of 
measurement. However, when the child is said to be short or the typist is classified as a ‘Grade A’ 
typist, it means the performance of the typist and the height of a child are being compared with | 
reference to some standard. A child is short because he is shorter than the general mean height of 
children of his age group and the typist is a ‘Grade A’ typist because his speed is faster than the 
average speed of most typists. Thus, the height of the child and the typing behaviour of the typist 
are being evaluated (and are not being measured). 


The History of Psychological Measurement and Mental Testing 


Measurement is the root of both biological and behavioural sciences. In biological science, 
measurement has a long history but the history of psychological and educational measurement 
dates back to around 206 BC when Chinese officials were examined every third year to 
determine their fitness for office (Gregory 2005). During the Han Dynasty period (206 BC to 220 
AD), the use of test batteries became very common. Tests had become an established and 
accepted instrument for such diverse topics like civil law, military affairs, revenue, agriculture, 
etc, During the Ming Dynasty (1368-1644 AD), tests for these various services were used as an 
accepted norm. During this period, a national multistage testing programme involved local and | 
regional testing centres. Those who passed the tests at the local level, went on to provincial 


Introduction Te Mees urcerney)) 5 


capitals for extensive einay examination and 
nto the nation's capital for 
yee jared fit for public offices, 
| The western countries learn about the Leatiy 


| 12 Propramenes thre vh ! 
, - ary A a : ar) The P : i AJ 1 sl Nese” - , 
pritish ROvErnine nt encouraged the East India Company in 1842 to foll Hees eons. The 
method of selecting, employees for averse 1 Tollow the Chinese system as a 


dS duty f fey the 4 
. ; - » Lowe a rit r ry A re . 
girmilat system of testing for its civil aTITISN government itself adopted a 


SETVICES jn 1855. Afte i he 

La ol 1 10a: wares t this, the Frere 1 ine Germa 

4 arnments also followed suit. In 1663, the US ZOVErNMnent bitin on ; t : 
“rit ! ne American Civil 


oryice Commission that conducted various competie... : 
serv! so a gah 6a i TPE itive EZaminations for certain government 
jobs. The Impact OF sue 2 BP Je Fairies Brew rapidly in other Western countr B+ 
at. ie - . eta AUTO t 
Most historians today fully apree that the he €s, too. 


a ces cae ee Binning of psychological testing re 
experimental investigation of individual differences that flouirichind Conmaneas eee SA 
after 1850. Prior to 1650, psychology was nor Considered to be an independem Science but ints 

r : a : : f ois J} 7 
4 branch of philosophy. Any attempt to measure human behaviour through experiment y 
ridiculed by philosophers, Accordin "e”) Experiments was 


gto Kant, Measuring human behay; 
as aviour was useless, because 
human reactions were such that they could not be observed and measured directly. But 


beginning with 1850, psychology began to shake off this old relation with philosophy and started 
forming a new alliance with biological sciences. The concepts that were discoverer in the fields 
of physics, chemistry and biology were proving to be of immense help in measurin human 
characteristics such as forgetfulness, intelligence, response time, sensation. siatpeian etc 
Biologists and physicists were pioneers in conducting experiments for measuring haweian 
behaviour. Influenced by their experiments, psychology started getting measurement-conscious 
and the movement of measurement was being spread through experimentation and 
demonstrations. As a matter of fact, the history of psychological and educational measurement 
during the period 1850 to 1900 was nourished by three major influences, namely, 
psychophysics, Darwinian biology and clinical practices, 

The nineteenth century, particularly the second half, had witnessed the 
number of scientific tools and innovations in physics and chemistry, and sci 
opinion that those physical methods could also be successfully tried in meas 
properties. As a result, psychophysics was born and experimental psychologists began to study 
the relationship between physical stimuli such as the intensity of light or sound, and the 
experiences or sensations produced by these stimuli, Many new experimental techniques of 
measurement followed these psychophysical experimentations. In 1885, the German physician 
Hubert Von Grashey developed what is called antecedent of the memory drum as a means of 
testing patients with brain injury. His report revealed that many patients could clearly recognize 
stimuli but could not identify them when displayed through the moving slot. Likewise, German 
psychiatrist Conrad Rieger developed another test battery for the brain damaged. The demerit of 
this test was that it took over 100 hours in its administration and therefore, soon went out of use. 


Besides psychophysical methods, some early physiologists were also influencing the 
development of psychological measurement. These physiologists were mainly interested in 
Measuring the processes of seeing, hearing, speed of conduction, etc. As a result, the first 
laboratory for studying psychological reactions was established in 1879 by Wilhelm Wundt at the 
University of Leipzig (now Karl Marx University). Although the laboratory was established for 
Measuring and studying psychological reactions, Wundt and his disciples mostly concentrated 
“UPON physiological reactions on ly. Gradually, however, this laboratory flourished and developed 
ino an excellent centre for the measurement of human reactions and invited the attention of 
s€veral experimenters from abroad, too. Although Wundt is credited with founding the first 
Psychological laboratory in 1879 in Leipzig, very few know that he was measuring mental 
Processes for years, at least as early as 1862. He conducted an experiment with his thought meter, 
Which was a calibrated pendulum with needle swinging back and forth and striking the bells. The 


thie 


wht oa: 
4 final round, Only ) Passer the 


Provincial examination: 
those who p Inalions 


: Went 
assed this third set of tes 


ls Were 


development of a 
entists were of the 
uring psychological 





® Tests 
COS Ld Pepe - Meat nn ‘ ii. 
nen and Research Methods in Behariounal Scences 


Purpose of thi ™ . 
Purp f this thought meter was to assess the swiftness of the thought of the 


claim was a it . 
vase vin la oe — the observed pendulum Position and ie P Und 
Wundt's Was relevant fo vat ee ne es ns thought Of the testeg, This | OSitigg 
mie on ved r stu ying astronomer’s longstanding problem. The probleg sis | 
tities as the-stare eee ee simultaneously use the same telescope for Noting m as thy 
Wundt's time. ar across a grid line on a telescope, there existed a difference Ssh 

, aremarkable event occurred in the history of science—Kinnebrog ENCE. Even oT 


the Royal Observatory in 


England, was dismissed in 1796 simr . AN assie 
| and, was ss sim ‘ause bi SIs 
stellar crossing times was 4 5 ply because his lant 


a bout one second slower (Boring 1950). AS8Sessm 
arwin’s theory of evolution also provided a big imr 
_ , a big impetus t 

spas , §" It originated in biology, it influenced the thinking Slog 

working in the field of biel ; 8 of Many sei Cal 
psychology, too. One of the basic notions of the theo CEN, 

that members of the «: : al aki SOTY Of eVolutinn 8 
ree lbiies ct ne same species are not alike, that is, individual differences ey; ION Wag 
i. : ers of a Species. Sir Francis Galton, a half-cousin of Charles Darwin, was qui ie among 

: of individual difference and gathered data on the individual differsincoe a "© pick Up | 
pay ological characteristics. He wanted to measure the basic individual differ hysical and 
ea beings and for this he singled out ‘human ability’ as a possible dimension eee among 

ong so, laid the foundation stone of the branch of psychology which js ods a! aM in 
Psychometrics. In tact, Galton was obsessed with the idea of measurement and had se = 
that virtually anything was measurable. His attempt to measure intelligence by means - belie 
lime (RT) and sensory discrimination tasks were well-known. His most influential ie reaction | 
Her editary Genius (1869) and Inquiries into Human Faculty and its Development ( vin eS | 
former book emphasized the genetic factors which were held to be Important for eming 3). The 
the latter was a series of essays that emphasized individual differences in mental aol war 
the help of time-consuming psychophysical methods practiced by Wundt and others, he eae 
assess simple sensory-motor responses. Thus, he continued the tradition of brass instrument 
mental testing but with one difference: his procedure was modified in such a way that it was 
amenable to data collection from hundreds, if not thousands, of the testees. Due to his efforts in 
popularizing practicable measures of individual differences, Galton is regarded by mos 
historians as father of mental testing (Goodenough 1949). 

To continue his study of individual differences, Galton set up a psychometric laboratory in 
London at the International Health Exhibition in 1884. Here the tests and measures being | 
administered involved both the physical and behavioural measures. However, in the long run, 
Galton’s atternpt to assess intellect with the measures of reaction time and sensory discrimination 
proved to be a fruitless effort. 

Later, Karl Pearson developed several sophisticated techniques for statistically analyzing the 
data on individual differences of human ability, which provided a good stream for psychological | 
measurement. The product-moment method of computing the correlation coefficient was one | 
such technique. | M Cattell, who had studied with both Wundt and Galton, continued testing and 
measurement programme in a scientific way. He invented the term ‘mental test’ in his most 
important paper Mental Tests and Measurement. He started his own research laboratory at 
Columbia University, at that time the largest university in America. There, he developed ( i 
of tests that were mainly extensions and additions to Galton’s theory. In fact, the pags 
and sensory bias of his various test battery reflects strongly of its Galtonian heritage. qT id es : 

i i ‘oO impc ; ins 5 aL srica. Among the most influent 
tried his best to import the brass instruments approach to America, B 


) isi }ous 
students of Cattell were E L Thorndike, R S Woodworth and E K Strong who, by devising ss 
ut among Cattel’> 


tests, kept the spirit of psychological testing and measurement continue. B tr here 
student, it was probably Clark Wissler who showed some surprising results in Cae nt ainwitl 
had the greatest influence on the early history of testing and measurement. In fact, his ¢ 


F jemons! r 


Mnfraduction To Measurement 7 





ate that the lest scores could predict ae ademic performance. His 


analysis showed 
tendency for mental les! scores tfc ore | ale 


virtually no ee ead tine with academic periormance, which 
thereby reairec ted the Lio aresting movement away from brass instruments. Also darnaging to 

. brass instrument testing movernent was very low correlations between the mental tests 
my igelves For example, Kl and colour naming correlated -0.15 and colour naming and hand 
Te cere correlated only 0.19. Thus with the publication of such discouraging results, the 
i srimental psychologists finally dropped the idea of using RT and sensory discrimination as 
a of intelligence. One point of view is also that this turning away from the brass 
TE anvenl approach was Important as it paved the way for immediate acceptance of Alfred 
ea useful measure of human ability, 
— art from the above main streams of psychological measurement, clinical studies growing 
f medicine, psychiatry and social welfare institutions in western countries also provided 
__manse impetus for the growth of measu rement. Clinical interest in the feeble-minded, insane 
min" ‘fit was first evoked in about 1900 in France. Psychologists working there were deeply 
and ied ‘n developing tests and instruments which could measure maladjustments and identify 
| sins ‘ble reason for insanity. Alfred Binet in 1905 gathered data based upon the intelligence of 
MO goine children in this context, which was later on manifested in terms of the development 
: | 


of different intelligence scales. | 
The beginning of the twentieth century was marked by a tremendous growth of 

chological and educational tests, and the development of measurement tools and techniques 
ie don a war footing. The first psychological test to appear was the Binet-Simon Intelligence 
oa ie to be one of the most promising verbal scales for measuring the intelligence of 
Seen it was a 30-itern measure of intelligence. Curiously, there was no method of scoring the 
ea (Gregory 2005). It simply aimed at identifying those school-going children who could not 
benefit from regular instructions in class. It has been revised several times. Its 1912 revision was 
important because in this revision Stern introduced the concept of IQ. 


oul © 


Later on, its several American revisions appeared. Besides this, several other achievement 
tests including the famous Thorndike Handwriting Scale appeared. In 1918, Woodworth's 
Personal Data Sheet appeared, which was the first personality inventory. During the two World 
Wars, the two well-known tests—the Army Alpha test and the Army Beta test—were developed. 
Several group intelligence tests and achievement tests also made their appearances. The 
outcome of all these enthusiastic developments in the sphere of psychological tests was a rapid 
development of psychological and educational measurements. During the last 30 years or so, a 
different trend has become evident. Much emphasis is now being put upon the methodology of 
development and use of measurement in almost all branches of psychology. Newer methods of 
development of psychological and educational tests have become the main centre of interest for 
psychologists. They are frequently found criticising the old methods and placing arguments for 
the adaptation of a new method. This methodology is known as the psychometric theory which 
mainly includes the basic principles of test construction. 


For a still better understanding, the history of mental testing in 20th century can be sketched 
and explained by dividing it into six periods (Thorndike and Thorndikechrist 2015): early period, 
boom period, first period of criticism, battery period, second period of criticism and the period of 
accountability, A brief discussion follows. 


\i) The Early Period: This period extends from 1901 to 1915 and was, in fact, a period of 
tentative exploration and theory development. The Binet-Simon scale was published in 1905 
and Was revised twice by Binet. The test was so popular that it was brought to the USA by several 
Pioneers in the field of mental testing and measurement. Of the various pioneers, Lewis Terman 
of Stanford University was most influential. In 1916, he published the first version of the test and 


oe 


wera. SOE. as Stanford—Bin 

= —Binet 
Arthur Otis, began Work in testing mental 
did a commendah ralia 


able work jn Australia b 
with people having hearing or lan : 


Another infly 
developed two 


Statistica] theo 
a) 


ential PErson in this field w 
theories relati 


reliability, 
a single dimension of ability underlying Most 
Consistency of peaple’s performance on different ability Measures 
result of this Beneral intelligence (or popularly called as g-factor of intelligence) 
, Boom Period: This Period started with American involvement in World 
fact, this involvement created an urgent need to improve the functioning of the army and 
Psychologists were Considered the most fit persons for enhancing this improvement Programme, 
Such an event and the recognition of the role of psychologists created a 16-year boom period 
(that is UP to almost 1930) which witnessed many types of advances and innovations in the field 
of mental testing and measurement. A group of psychologists led by Robert Yerkes expanded 
Otis’s work for developing and implementing large-scale 8roup testing of ability. Tw 


© popular 
lests, the Army Alpha test (a verbal test) and the Army Beta test (a performance fest), were 


(ii) The 


War |. In 


ly unstable 
military service, the first objective measure of personality calleq 


Woodworth Personal Data Sheet, was developed in 1918, 
After the war, Psychologists showed interest in studying a variety of behaviours such as 
interest and attitudes, E K Strong and some of his collea gues developed Strong Vocational Interest 
Blank (SVIB) shortly after World War | for assesing vocational interests to help college students 
choose careers Consistent with their interests. Similarly, in 1929, LL Thurstone Proposed methods 
| values. However, the period following World War l 
witnessed a low Point for the history of mental testin & movement because some critics like Walter 
Lippmann criticised both these tests as well as the conclusion drawn from the test scores, 


(iii) The First Period of Criticism: This period roughly covered the period of 1930s and has 
been described as a Period of Criticism and Consolidation . Ma ny new tests were published and 
some were revised during this period. The most popular personality test, the Minnesota 
Multiphasic Personality Inventory (MMPI), was developed and Wechsler-Bellevue Intelligence 
Scale also came up during this period. Kuder Preference Record was also developed. Some 
advances were also made in mathematical theory underlying test. For example, L L Thurstone 





objectified and supported, 


(iv) The Battery Period: During the 1940s 
Psychological and educational tests was started. Bas 
there were severa| dimensions of abilities, some test 
these test batteries encouraged the movement of te 
interest in developing test batteries and related 


and 1950s, again, Widespread use of 
ed on the theory of Thurstone and others, 
batteries were developed. The success of 
sting further and experts were showing much 
statistical technique called factor analysis. 


Miroduction Te, Measuremeys 


9 
mies of ability such as those 


Provided by pI , isu = 
Taxone I ¥ Bloom and Guilford were f 


-ibing the range of various lypes of mental functioning, 
descr ing the 1950s, psychological and education 

Dur Many test batteries were being used for S5S5€5SiINg acacde 
concen Business organisations, Industry and the civil Ser 
S aliy, ability and aptitude test for ass 


ound much helpful in 
al testing became much like a business 
: mic achievernent of students in 
VICE system increasingly Made the use 
SSINB Various mental functioning In this directi 

1 ) I : ' ection, 
f the Differential Aptitude Test (DAT) and the Genera| Aptitude Test Battery (GATB) were 

Not only this, in mental! he , ' 


" spitals, patients were routinely assessed through 
sass personality tests and ability tests. [np 1954, the American Psychological Association alen 
varl 


ased a set of guidelines for effective use of Varlous Psychological and educational tests. 
) The Second Period of Criticism: The period Started in the 1960s when there was 
derlying revolt against tae dominance and tests Were considered asa 
bi a hora (oF irae resatcat: ¢: nat urea, y tests, the scores of females were 
ae han the scores of males. Likewise, Personality tests were also thought as unnecessarily 
nl the male personality. At that time, the most serious concern was the Possible use of 
ality and personality tests: in making discrimination against females or members of minority 
aa in employment and education. As a consequence, tests were being subjected to Various 
BO Eons Some legal proceedings in the courts of law were also started in the USA. 
aan The Period of Accountability: Despite the fact that there 
Ronee measurement tools, the government in the USA relied more and more on the use of 
cael dized psychological and educational test IN assessing school achievements of the 
stan is In 2002, the NCLB (No Child Left Behind) bill was Passed by the US government that 
le that all public school students must achieve high stan F 3 
See mathematics, arts, etc. Low-performing schools 
fementions ranging from technical assistance to various 
Schools must be held accountable for the performance of 
performance by using standardized tests. 


LEVELS OF MEASUREMENT (OR MEASUREMENT SCALES) 


Before we go on to discuss the different levels of measurem 
about the different postulates of measurement. A postulat 
which stipulates the relationship between 
postulates will help us to understand 
the different levels of measurement 


postulates of measurement. These 
headings. 


|. Postulates relating to equalities or identities 
2. Postulates relating to rank order 

3. Postulates relating to additivity 
Ordinarily, 


the cl 
of pers© 


se 0 
the US 
most common. 


Was a public criticism of 


types of restructuring programmes. 
students and they can assess their 


ent scales, we must know something 
€ is defined as a sort of assumption 
Eroups, objects or events being measured. The 
the different relationships among the objects measured by 
scales, According to Guilford (1 954, 11), there are nine basic 
are briefly summarized under the following three general 


there are three Variations 
Variations in postulates relating to rank 
additivity. Thus a total of nin 


The three Variations 
1. Eithera=h ora 


in postulates relating 
order and four varia 
€ postulates of measurement exist. 


to equalities or identities, two 
tions in postulates relating to 


in postulates relating to identities or equalities 
#b. This means numbers are either equal 
(a # b) but not both. This postulate is more essential for clas 


lf a=bthenb=a This means that the relation of equality among numbers is 
symmetrical, and therefore, we can interchange a for bor b for a. 


-lla=b, b= c then a=c, This means that the objects which are equal to the same object 
are also Equal to one another. 


are as follows: 


(a= b) or they are not equal 
sification. 
2. - 


- 


t can 4 { 1 i r il 7? J i ii We 
/ © : ‘y 1 t! ry 


and indicates 


lating to rank are: 


means that the 
a>b 


The two postulates re 
relation betw 
ne 


4. Ifa>bthenbFa This 


the relationship ora<bll 


a definite order jp | 
ments in psychology | 
MF a 


we cannot reverse 
5 ifa>b,b>c then a>. This is a transitivity me - mos! ‘ 
the ranking. It is an important postulate upon when 
and education are dependent. 
follow>: ae 
tes the summation Process and | 


sociology 
The four postulates re 


6. Ifa=pandb>0 then a+ 


{ my F { 
[ 0 5 Li a te deno te J th al il 1 


_atb=b+a This 
numbers is not important because Ila =" 


additivity are as 
postul 


dded, 


ate indica amet hia 
: 3< no variability ' the result, 


lating to 
it produce: 
f addition, the order of the 


b > p. This 


, zero Is 4 
: ra) 


“al 


indi 


his postulate : 
This P stituted for one another withouy 


equal to 6. 
B. ir axpandb=qineretP =P”), zat sub 
addition, identical numbers OF objects May be 
aaking any chan e in the resull. ; - 7 i 
a any hi ogy Thi postulate indicates that In the process of addition, the order _ 
OQ. lat j+c=art Ci i cami . . difference- 
hinations of objects or numbers makes no differ | | 
of com SO eT akanicees of assigning numerals [0 the attributes of objects 
I at least four different ways of | 


(1951), there are 
these four ways, there are four 


cts. Based upon 
lates are enough to account for the 


mea surement is 
Following Steven 


ibutes of the obj 
(or scales). The 


As we know, 
according to some rules. 
assigning numerals to the attr 
ditterent levels of measurement 
following four levels of measurement. 


or Classificatory Scale of Measurement 
(or scale) is the lowest level of measurement. 


identify or classify persons, objects, grOUP> 


e 
first five postu 


In nominal measurement 
atc. Nominal scales are 


objects. For example, a sample of persons 


numbers are used to name, 

really not scales and their only purpese is to name 

being studied may be classified as (a) Hindu, (b) Muslim and (c) Sikh or, the same sample may be 
classified on the basis of sex, rural-urban variable, political party affiliation, etc. All these 
classifications would be examples of nominal measurement. Classification of persons into 
clinical groups such as schizophrenia, manic-depressive psychosis, phobia, etc., also constitutes 
nominal measurement. In nominal measurement, members of any two groups are never 
members of any one group are always equivalent. And this equivalence 
ive, transitive and symmetrical. Nominal data are counted data, In case of 
ble statistical operations are counting or frequency, percentage, 
y: Addition, subtraction, multiplication and 
Is themselves Ca 





Nominal 
Naminal meas urement | 


equivalent but all 


relationship is reflex 
nominal measurement admissi 
proportion, mode, and coefficient of contingenc 
division are not possible because the identifying numera 
awback of nominal measurement is th 


added, subtracted, multiplied or divided. The dr 
It is the least precise method of quantification. Because of these 
measurement 


elementary and simple. 
nominal measurement is not a 


characteristics, some experts are of the view that 
at all. 


Ordinal Scale of Measurement 
This | j 
cain —_ level of measurement in which there is the property of magnitude 
equal interv ee ae 
als or an absolute zero. In ordinal measurement, numbers denote the rank order of 
rom lowes! 


the object indivi ; 
: eee lina Here numbers are arranged from highest to lowest ort 
shest. Ordinal measures reflect which person or object is larger or smaller, heavie 


| 
ror lightet 


nnot be legitimately 
at it is most 


but not of — 


qual to 6, and2 + 4is also | 
| 


cates that in the process of | 


feripoediec tian ite Meuurcmient ii 


harder or solter, ete., than others. Persons May be prouped according to 
‘greater than OF Nesser thar’. 


prigh! r or duller, bis : ! | 
angl OF psychological traits to convey a relationship Vike 

: ¢ status is a good example of ordinal measurement because every member of the 

than every member of the middle and lower ¢ \ass. 

than every member of the 


al prestipe 
middle class is higher tn social prestige 
rerms of their ae ademic achievement to 
in ordinal measurement, hesides the 
r than’ exists because all 
greater or 


ac iO- ‘i. ; sal 
~ lass iS higher in soct 
rot the 

ranked Ist, 2nd and 4rd in 


ute the example of ordinal measurement. Thus, 


stil * : 4 ; é é 4 
_. of equivalence, 4 relationship of ‘greater than’ or ‘lesse 
r subclass are equivalent to ach other and at the same time 


fatiO : 

rela eg of any particula e 

an the members of other subclasses. The relationship of greater than is usually irretlexive, 
The persmissible stat! nal measurement are 


Jessel ae asymmetrical. stical operations In ordi: 
rcentiles and rank correlation coefficients, plus all those which are permissible for 


median 
| measurement. 
of ordinal measurement iS 


drawback 
that the distance between 


‘kewl’ eo, eve;»ry membe 
. class: students may he 


that ordinal measures are not absolute quantities, 
equal. This is because 


the different rank values is 
urements, nor do they incorporate absolute 


vey 
ments are not equal-interval meas 
n example. Suppose an honesty test is given to a sample of 30 students. A 
of 3.and 7 respectively, whereas D and Z are given the rank of 11 and 15 
B ranks four points above A, and similarly Z ranks 4 points above D. 
hat the difference between A and B, and that between D and Z is 
rank values which may be equally spaced do not convey 
ch as 50, 40, 20 
The difference between 50 


qual. Likewise, intelligence scores SU 

of X, Yan 3 1, 2 and 3 respectively. 
and 40.is 10 and the difference between 50 and 20 is 30. But both these differences are equally 
a5 of ranks. urement from equally-spaced rank values, it is 
he second demerit of 


spaced in tern Thus in ordinal meas 
difficult to infer that the underlying trait will also be equally spaced. T 
ment is that ther ertain whether a person has any of the 


ordinal measure e is no way to asc 
characteristics being measured. 


terval or Equal-interval Scale of Measurement 
evel of measurement and includes all the characteristics of the nominal and 


easurement. The salient feature of interval measurement is that numerically 
al distances in the properties of the objects being 
rement is constant and equal. This is the reason 
Since the numbers 


val measurement. 
btracted trom each other. For 


The 
do they co” 


ordinal measure! 
zero point. tet us take a 
and B are given the rank 
respectively: In this example, 


But from this we cannot say t 
ual because in ordinal measurement, 


that the underlying properties will also be e 
dz respectively may be ranked as 


In 
This is the third | 


ordinal scales of m 
equal distances On the scale indicate equ 


measured. In other words, here the unit of measu 
why interval measurement is also referred to as equal-inter 


are after equal intervals, they can legitimately be added and su 
example, suppose four objects A, B, Cand D have been measured and given the score of 20, 16, 


8, 4respectively on an interval measurement. Here the difference between A — CO =20=—8:= 1255 
LaNee D= 7 - : = 1:2, Interpreting the same in still another way, ‘t can be said that the 
intervals on beaded wi | Bay -16=4 and the interval between C and D =8 — 4=4, These 
dis oa . pe e a ae (20 — 16) + (8 —4)=4+4 4=8.Thus onan interval measurement, 
ee aie istances (not the quantities or amounts) can be added. Therefore, In an interval 

difference (or interval) between the numbers on the scale reflects difference in 


magni | | ; , ; 
Siam However, the ratios of magnitudes are meaningless. 
thou it j ; : ; 
additivity en ve rue that the intervals can be added, it does not mean that the process of 
di ancevion an Je out in the absolute sense. The process of additivity of intervals or 
n interval measurement has only a limited value because in such a measurement 


Zef0 point is not t 
is NOt true 
property being sine but rather arbitrary. Zero point, here, does not tell the real absence of the 
g measured, It is selected only for some convenience in the measurement. Sup | 





a i Doi eC GEA Ee 2 E PIR a fs aiid To } ey ff d i i ? 7 fF f ? 2 
Eo aE 4, C ie fcr a] 


that the two ; 
that’ Sand ace Se: None are 5 and 8, Obviously, thes 
8+ 5=13, whi ’ dove the arbitrary zero point ; y: these 'WO Values , 
: ; ich means that we have a total j Pp nt and, adding these | es Meg 
point. Now suppose for the time bein be hee that is, 13 POInts above is Iwo, We A 
tn th ; : rary zero is itcalf € ar 
point, then in that case the value 5 sheer es pig ¥Y Zero Is itself 3 points bow d bitrary bi 
same meaning is to be conveyed. Adding 8 - fe = 8 and the value 8 should Rea ue ral 26, 
additivity on an interval measurement ae fT], we get 19 and not 13, Thue + Tif the | 
Fahrenheit and Celsius thermencn < a Carried out in the absolute sense Process oy 
and aptitude tests are good enue ine oe on in our calendar, scores on esi the te 
on any numerica l aphivide tea W-doee nore an Ifa person secures a « BENCE tag, | 
operations. Likewise, if a person secures a score of 80 a eee Knowledge of numer 
not mean that he has a numerical aptitude which is ue rame numerical aptitude mai 
secured 40 on the same numerical aptitude test, Al | thi 5 hiss ara cel Person who fe 
numerical aptitude test is not true, rather it is arbitrary ae pestis ine ZETO Point on the | 
interval scales and therefore, have this limitation. The —. ape and INVentories are the | 
measurement are arithmetic mean, standard deviation, Pearson hand ar used in Such | 
upon them. Statistics like the t-test and F-test, which are widely used tests f ue Statistics based 
be legitimately applied. The only statistics which cannot be applied in nee can alse 
the coefficent of variation. The reason is that the coefficient of variation — measurement i | 
standard deviation to the arithmetic mean. Standard deviation is a fixed Pgh at falio of 
measurement scale because it is not affected by any shift in the zero point. But eit oie | 
to vary whenever there occurs a shift in the zero point. When the mean is affected, the pied) | 
of variation will also be affected. As such, it is advisable not to calculate the coclficiene a 


variation from interval measurements. 


Ratio Scale of Measurement 
It is the highest level of measurement and has all the properties of nominal, ordinal and interval 


scales plus an absolute or true zero point. The salient feature of the ratio scale is that the ratio of 
any two numbers is independent of the unit of measurement and therefore, it can meaningfully 
be equated. For example, the ratio 16:28 is equal to 4:7. In ratio measurement, all the nine 
postulates of measurement can be applied. Also, all statistical operations including the 
coefficient of variation can be utilized. 
The common examples of ratio scale are the measures of weight, width, length, loudness, 
and so on. It is obvious, therefore, that ratio scales are common among physical sciences rather 
than among social sciences. To be more clear, a student must know the distinction between the 
interval scale and the ratio scale. The fundamental difference is that in the former, the zero poinl 

easured in terms of Fahrenheit 


is arbitrary but in the latter, the zero point is true. Temperature m : of Fahrer 
d length measured in terms of feet and inches isan 

length of two sticks and say that one i ‘ 
cond stick is twice the length of the firs 

ans actually 3’ from 0”. But suppose 
year on the same dale. 
ear? Obviou sly, the 
(true zero poinl 


and Celsius is an example of interval scale an 
example of the ratio scale. When we measure the 
long and another 6’ long, we have a clear idea that the se 
stick. This is because 6’ means actually 6’ from 0’, and 3’ me 
the maximum temperature of weather today is 40°F and it was 20°F last 
Can we say, now, that today is twice as warm as it was on the same date last y 
answer is no because 0°F does not reflect the complete absence of temperature 
in Fahrenheit measurement. | 
en 
In psychology, sociology and education we frequently encounter . 
because most of the data obtained from measurement in these sciences are s 
an equal unit of measurement and arbitrary zero point. The basic propertie 
scales of measurement has been summarized in Table 1.1. 


interval measure . 
uch that we assul . 
s of all these lov 


13 


friroduction To Measurcrment 





Table 1.1 Basic properties of scales of measurement 
Property | 
“Types of scales Magnitude Equal intervals Absolute zero 
Nominal No No a No 
Ordinal Yes No No 
Interval Yes Yes No 
Ratio Yes Yes Yes 


pROPERTIES OF SCALES OF MEASUREMENT 
There are three important’ properties that make scale of measurement different from one 
: nother—magnitude, equal intervals and absolute zero, These are discussed below. 

1, Magnitude: In tact, magnitude is defined as the property of “moreness”. Any scale is 
said to have the property of magnitude if it can be said that a particular instance of the attribute 
represents More, less or equal amounts of the given quantity than does another instance. (Mc Call 
1994). For example, on a scale of weight, if it can be said that Mohan is heavier than Sohan, then 
the scale can be said to demonstrate the property of magnitude. But suppose the experimenter 
labels his samples as Group |, Group Il and Group Ill, and makes no reference of moreness, the 
scale will not be said to have the property of moreness. 

2, Equal intervals: A scale has the property of equal intervals if the difference between 
any two points at any place on the scale has the same meaning as the difference between two 
other points that differ by the same number of scale units. For example, the difference between 4 
kilograms and 6 kilograms on a weight-measuring scale represents the same quantity as the 
difference between 16 kilograms and 14 kilograms, i.e., exactly 2 kilograms. 

The above simplicity of meaning, however, does not imply to a psychological test, which 
rarely has the property of equal intervals. For example, the difference between IQs of 50 and 60 
does not mean the same thing as the difference between IQs of 110 and 120. Although each of 
these differences is of 10 points, (that is, 60-50 = 10 and 120-110 = 10), the 10 points at the first 
level do not mean the same thing as 10 points at the second level. In fact, when the scale really 
has the property of equal intervals, the relationship between the measured units and some 
a aaa be described by a straight-line equation in the form of y =a+ bx. This equation 
obviously shows that eh Increase in equal units on a given scale reflects equal increases in the 
meaningful correlates of the units (Kaplan & Saccuzzo 2001) 

is (paige ane An absolute Zero is said to exist when nothing of the property being 
eat. ee For example, if a doctor is measuring the heart rate of a patient and finds that the 
‘ae a eh rate of zero and has died, the doctor would readily conclude that there is no 
delves - bulfor many psychological measurements it is very difficult, if not impossible, to 
eline absolute zero. For example, if a psychologist ' 7 | i ' 
through 5, it is very hard de fj PsYCNOlogist wants to assess shyness on a scale from 0 
Fe mee “ ait to e ine what it means for a person to have absolutely no shyness. 
: i G me ry = i 7 . = 
asic attributes of scales of measurement over which all the four scales of 


mea Se eee 
surements, as pointed out by Stevens, differ from each other. 


ENCTIONS OF MEASUREMENT 

N psycho | 

lo digingu Pl sabia ia measurement has got varied functions. It may help a class teacher 

Psychologist to di. een the high achievers and the low achievers; it may help a learnin 

elp an industros gnose the current status of the reading ability of a particular child: or it : 

iNeasuisine << Psychologist to place a worker in the right job. Th ivi TuRciio — 
ent are as follows, | SHE IeR THe patnary functions ie 





14 fests, Measurements ane Research Vethods in Bohaviounal Sciences 
fnfroduction fo Meashiremerti 15 


in Selection 
, need for modification in the instruction, both quantitatively as well as qualitatively. To 


Selection of personnel in industry or other institutions may be carried out w; 
psychologists. The function of measurement tools in selections is to edi With the al | phere is ¢ his measurement plays a vital role. By measuring the outcome of the instruction, it 
individual. Sometimes it is found that the personnel selected fail - Ley : ie the Ability of Of | achieve ; Ane how many students are being benefited by the instruction and how many are not 
management is. forced to take some drastic action seairist sa seek wi their Mettle an d the | peconr efited by it or are least benefited by it. After such an an alysis some essential suggestions 
Here again, measurement plays an important role by provi ding dat Gr ite SUSPeNsign, « bein 3M fication in the instruction can be made, 
suitable disciplinary action can be recommended against such personnel. WIP Basis of Which | eae ‘ PSYCHOL . 
In Classification | pts TINCTION ——— OGICAL MEASUREMENT AND PHYSICAL 
Measurement also helps in various types of classification witch . MEASUREMEN ae ee = | : 
necessary for a programme to be carried out effectively. For inet is lia become aqasurement Has ae fi | : i ae nee pone past eo: Saagnclan la on 
ee 6 es ul effectively. For instance, there may be S Ver,| Mea yantitative measurement. Psychological measurernent comprises the measurement 
children ina class and their class teacher may wish to separate them in ord Se Problen, hysical . Seeks traits, habits, tendencies, and the likes of an individual whereas physical 
children in the class do not suffer. Here, measurement wil! help the teach il that the Othe, of mente sh comprises the measurement of objects, things, etc., which are often physically 
children into the following categories: retarded, average and gifted. Likewise, . Classifyin the | easrenn world. Usually, physical measurement is concerned with the measurement of 
+4 PSYchiatrist Or | present n length, size, volume, etc. Given below is a clear-cut distinction between 


clinical psychologist may wish to classify his patients into different Categories of teal ht, weight, 
301 mental illness | height gical ne: physical measurement. 


and here again, he is helped by the tools of measurement. In industry also, measurem h ycholo 
= ’ | ent 5 i “d ss 5 ee 
elps | p 1. In physical measurement, the unit of measurement is fixed and constant throughout the 


personnel to be classified according to the indices of job satisfaction accident 
‘ Pronen a : 
“ urement whereas in psychological measurement the unit of measurement is not fixed and 
the process of measurement. For example, a kilogram or an inch has the same 


absenteeism, etc. 
In Comparison Wee during | ; ers ; 
cme sees jing at whatever place the measurement is being taken and it conveys the same physical 
The pioneering work of Galton and Darwin has revealed that no two individuals are alik means ay or meaning throughout the measurement. But suppose one is measuring 
other words, there always exist individual differences in traits, mental processes aa ‘ | =. onal There is no fixed unit of measurement in this case because some may measure 
tendencies, educational achievements, abilities, etc., between any two individuals. Whene w heathaaleene the basis of verbal questions or items answered in a specified time; others may 
el | intellige gence on the basis of some manipulative tasks done in a specified period; 


two persons are to be compared on any of the above factors or so, measurement comes into use fer to measure intelli 

With the help of appropriate measurement, it is possible to conclude that A js better than B ‘ | il seal aah group may prefer to measure intelligence on the basis of both time and error in the 

spelling tests or that students of class X of an urban school are better in their academic | “ompletion of a task, and so on. Moreover, these units tend to vary themselves during the process 
of measurement because there is no standard method of presenting a uniform set of difficulties to 


achievements than their counterparts in a rural school. 
all examinees. For example, a particular item to measure intelligence may seem very easy to one 
examinee but may seem very difficult and challenging to another. 


In Guidance and Counselling 
2. In physical measurement, there is a true zero point whereas in psychological 


Measurement also assists psychologists in guiding and counselling. Counselling is a sort of | 
specialized guidance programme and refers to the advice given to an individual so that he can | 4 a ag ik. irx truss AE eHeaAta worrahicn actual 
arrive at a workable solution to various adjustment problems in life. Measurement can help an tia aE rae pe toad a ae? YR US sles ciel " daira oui _ ie wie: 
individual to know his strengths and weaknesses; it may provide an insight and understanding | eee meLne ck ees a . at api aria winereas Dy an a mea ete ae 
into the relationship between the counsellor and the patient; it can help to make accurate | be Pas ee pee fa emo bed ay se de absence e! the ran being measured. 
or example, when a person gets a score of zero in a numerical ability test, it does not mean that 
he has no knowledge of numerical operations at all. But an object having zero length will be said 


predictions regarding problems of adjustment likely to come up in the future and also diagnose 


mental disabilities, aberrations, deficiencies, etc. to have no length at all 


3. Physical measurement is more accurate and predictable than psychological 
measurement. This is because in physical measurement, there is a true zero point. For example, a 


In Research 

Measurement helps in research activities, too. In fact, measurement is the fundamental basis of 

all psychological and educational researches. Research is the investigation undertaken to __ stick of 20" length means the stick is definitely 20" above the zero inch, and similarly, a stick of 

discover new facts about a problem. In psychological and educational researches, usually, the 60" length means that it is thrice as long as the first stick. But a person scoring point 15 in an 

effect of one variable or a set of variables is studied while the effects of all other variables are intelligence test cannot be said to have scored 15 points above the zero point because here zero 
Point is itself not known. Likewise, another person securing score 30 cannot be said to be twice 


statistical Galculation, a gi of measurements is resorted to, to conn ent elie ne se . 
research becomes i a oe yk ig ila pee me B . the Tirst person. Therefore, no prediction can be made with definite accuracy. 
Psychological or educational research. measurement may be regarded as we mare measurement is direct whereas psychological measurement is indirect. Whe 

inches ore he be length ofa cloth, we place it betore us and directly measure its length | 
intelligence ec it “ NGE possible to measure the “extroversion’ trait of personality 
Measured ‘oditece uae it cannet De placed physically before us. Extroversion can only | 
intelligence 7 , ough some responses given by the person concerned and for measuri 

, We will have to depend upon some responses—verbal or manipulative, 


controlled. In doing so a 


" improving Classroom Instruction 

teache r . 

different regu all the students in the classroom in a similar manner. However, the results aré 

mental ability Hee Obviously, for some students the instruction is at par with ier 
n OMers it may be either below or above their mental ability. In both the case>: 





av a 


tire quantity can be measured, whereas in Psychologie, 
il ‘ Yu | apa 
hysical measurement the sh st be measured but only a sample representing th 
; - ity anno Cs ok | | c : 
f nent th ST cay or Wi ance, one is to measure the lengths and weights of 
measurement th sasured. Say, for inste , one eeene eee 
Hetty oT inor 's home. The entire lengths and weights of ¢ | ss and ct 
ables oT oie ae in terms of inches and kilograms respectively. But SUPPOSE One jg 
eee a tian Na titude of class X students of Bihar. Then ordinarily, it is not Possible 
es Ce a ) ; asa \ ot , | r 
measuring the avons | aptitude (through an appropriate test) of every student of cag 
‘e the anice vue f every st i 
Saute sat : Bihar. Naturally, one would randomly draw a sample of students who ate 
i eo cll. c cANly, 7 \ ia I | 
oe tative of class X and measure their mechanical aptitude. 
taken to be representative Of Class . 


STEPS IN MEASUREMENT PROCESS 


all| 


ri 4 ‘ . 
Psychologists and educators have shown keen interest in the measurement of human 
sychologists anc 


bilities, interests, attitudes and personality traits. Whatever is being measured in any field, the 
aDIIties, as : pli 
measurement involves three common steps: 


(i) Identifying and defining the quality or attribute 


(ii) Determining operations tor isolating and displaying attribute 
(iii) Quantifying the attribute. 
A discussion follows. 


(i) Identitying and Detining the Quality or Attribute The first step in measurement is to 
identify the attributes that are relevant and important to measure. For example, ifa wiht. is 
selecting employees to become truck drivers, the ability of verbal comprehension and ability to 


solve quantitative problems will not be relevant attributes. However, other abilities or functions 
such as eye-hand coordination, depth perception, low accident 
attributes. 


After selecting the relevant attribute, the next step will be to ye a — 
, ; : ifficulty in defining abstract and difficult-to- bs 
chologists and educators face great di : = f 
aie the people. Such property is technically called as construct. oe porbae Se. 
ai rae: z asked questions here are: What kind of behaviour sha 
Is a construct. Frequently asked questions he anibsteenincna chro where 
satnonsiand as reelligesce Will it include skill in social interaction? Will it relate to ability to 


ili djust wi vel situation? Mit retel 
deal with only concrete objects? Will it refer to ability to adjust with novel situation? Will 


ogist | ea, 
to the speed and fluency of a response?, etc. A psychologist or educator must arrive at a cl 
. . efiniti idere 2}evant one. 
precise and generally accepted definition of the attribute considered as relevant o 


- pin 
‘li) Determining Operations for Isolating and Displaying heck a bari wlll 
developing a measurement procedure is how to find a set of she oe defined by haw it 
attribute of interest and correctly display it. As we know, when an attri a 5 : : sea ee 
measured, it is said to have an operational definition. In so far as the p yoina omens an 
concerned, this step is not much difficult and the person easily determines - enath of the 
through which the attribute can be measured. For example, if one bien seek ing one of them 
table, the ruler and the tape measure are uniformly accepted instruments an ii bop . ca at 
along the table is the correct procedure for assessing the length of the L. é because there 
psychological and educational measurement, this process is a bit complicate ' : : each 
have been several lypes Of invention of instruments and procedures for eliciting behav ationally 
serve as indicators of the relevant altributes of the people. For example, how do we oper fact that 
define intelligence? What behaviours are indicative of the attribute—intelligences The ype 
there is no single, universally accepted test of intelligence and that different tests include di Mie 
lasks for defining intelligence, becomes evidence for the fact that we don’t have agi ol 
consensus on what appropriate procedures are for eliciting intelligent behaviour. This ah a 
“ONSENSUS Is the pe | iC of psychological and educational measurement and is tf 


neral characteris devise 
outcome of ambiguity in the definitions of the allribute and the varieties of instruments dev 
ours of the attribute. 


lor eliciting the relevant behayi 





airs CaN be | 











records will be the relevant | 


fritp Ti 
reel tee Ieapy 1 Measuremory 


(iii) CQuantitying (he Attribute Once the set of a 
ppted, the next step is to express their ouyte ome 
heen ace ‘aianing some numerical values ace Ording to 
dane port allows to answer questions such tie Resor ccce 
ang question basically becomes, hoy many inches? whe easel In 
table | cians any two Inc hes are equal and this is established d 
measure | sychological and educational Measurement of attrib 
ani lity can be demonstrated by direct comparison in the | 
whose on chologists or educators have to depend on a somewhat 
shown. 4 quantification Generally, a researcher Considers | 
units uh -onetic problem solved, a word defined or an atti 
as alt ae snd subsequently, use the total numbe 
other si representation of value representing the Person on that attribute 
the aa lly completed provides a manageable definition of if 
a ce x equivalence of different tasks in the series. 
evi ie said above, in quantifying the attribute, num 
values in measurement has several advantages of whic 
(a) Use of numerical values in measurement 


Nand makes jt 
meaningful and precise. For example, when we say that Mohan is 6 feet tall or the weight 
of Ram is 60 kg, a clear meaning regarding height and weight is communicated, 

(b) Another advantage of quantification is that it allows useful mathematical operations in 
summarizing the quantitative information. For example, by adding, 
multiplying and dividing several information, we fj 


17 
ations for identifyin 
> IN quantitative lerms, T 
| d Set of rules. This se 
a5 how many? 


590 attribute has 


his is Usual 
lol rules jc Called 
_ © ase of length of a 
any feet? In these Physical 
rect COMParison, However 
#5, there are no such units 
Way that equality of inches s 
arbitrary definition to provide 
ONE task sure essfully ¢ ompleted such 


: tude staternens endorsed Equal to any 
FOF Successes or Endorsements for 4 Person as 
SUCH count of tasks 
the amount but has no adequate 
erical values are used. Using numeric 
h two are important ones. 

facilitates communicatic 


al 


subtracting, 
nd the average to convey the 
summary of the performance. 
These are the major steps in the measurement process that a psychologist or educator has to 
go through. 


PROBLEMS RELATED TO THE MEASUREMENT PROCESS 
We have seen that there are thre 


educational measurement, several 
encountered. 


e steps in the measurement 


process. In psychological and 
problems relating to eac 


h of these three steps are 


(i) The researcher encounters problems in selecting the attributes of interest and in defining 
them clearly and unequivocally. In defining concepts like intelligence, anxiety, adjustments, 
‘Ooperativeness, and so on, we expect diversity in definition. For example, in 
intelligence, to what extent a definition should include each of the tollowing abilities: 
morize 


defining 
* ability to me 


* ability to profit by past experience 
* ability to Carry Out abstract reasoning 
* ability to make adjustment in v 
ability lo generalize from 
Mi) The se 
°F Some attr; 
Neasuremp 
Various Star 


arious situations 
an event 

cond problem relates to devising procedures for eliciting the relevant attributes. 
butes, the Psychologists or educators have succeeded in identifying suitable 
N operations, For example, in the domain of abilities, they have been able to develop 
come o malas, ests through Which measurement js done int stn 
the emplo ‘ Mbutes such as lor a client's anxiety, fear, the initiativeness and umn Ser ace 
future mee such Salistactory standardized measures are available. It is one ct 
andar 6  OBISIS ators, through their active research, will be able to fin 

“erations ssing these attributes also. 


W educ 
lor asse 


—. tit. & ee Pe | 
io PEO, bh kere Pera! © aa 


(iii) The third problem relates to the equality ot units. In Psychologica| aaa 
measurements, units are set equal by definition and in reality, their equality cannot @ 
rales! sense. In a foot, there are twelve inches and al| Inches or UNits of ai hey 

pee ; as : 


are equal. But in intelligence tests, items of analogy such as ‘Hot is to cold as Wetisto . rem 


equal to items of arithematical series such as 2, 8, I: 17,...2 Such inequality nite i atehg 
the measurement process. Perhaps due to such inequality, addition, subtraction an woe lem i 
of scores remain a suspect issue. - MPatisg, 
Despite all these problems in the psychological and educational Measurement. : 
has been found that the information provided by tests is more accurate than nest 
available from other sources. "Mation 
GENERAL PROBLEMS OF MEASUREMENT 
Science is a kind of social institution. Its scientific value lies In meaningfully COMMunicat; 
observations of events or individuals to other persons. It is towards this end that measure et 
which is a sort of quantitative description of events or individuals, makes significa 


contribution. As a matter of fact, the progress of any discipline is today judged by the ext 
which it has been able to make a quantitative description (i.e., measurement) of its subject | 
The reason why we attach such importance to measurement is that it allows extremely acc 
and objective quantitative description of events. In fact, accuracy and objectivity are its twal 
principal advantages. { 
Despite the fact that measurement is the heart of the social sciences, it has certain sal 
problems which social scientists need to look into. Measurement would not be of much. 
importance till these difficulties have been removed. Some oj these problems are 





enumerated below. 


Indirectness of Measurement 

Most psychological and educational measurements are indirect. This is because moet 
psychological and educational variables cannot be observed and studied directly. For example, 
suppose a teacher wants to measure the intelligence of students of a particular class, As. 
intelligence cannot be directly seen, touched, or experienced, the teacher has to depend upon 
measures which include a sample of behaviour representative of an intelligent act. Such : 
sample, however, may itself suffer from a number of limitations. For example, these measures 
may not be reliable, valid, and practical or may not be objective and true representative of the 
actual behaviour being measured. In such a situation, measurement of any trait or variable itself 
becomes a source of perpetual difficulty. 


Incompleteness of Measurement 

Psychological and educational measures are generally incomplete, and, therefore, the 
measurement of any psychological or educational variable is also incomplete. For example, 
when an investigator is assessing the attitude towards co-education, he is required to construct a 
scale in which a number of samples of behaviour expressing such an attitude need to be 
incorporated. This number has no limit. Any attempt, therefore, to measure such an attitude 
would be partial and incomplete. In such a situation, measurement will be dubious and tend to 
create a misleading index of attitude, 





Relativity of Measurement 


Psychologi | :; ; - slaaical | 

oh a Ee educational Measurements are relative. This is also true ol sociologia! 

Suopiie al € Concept of relativity in measurement can be explained through an examp 
1 dent of class X, was given two tests—a test of arithmetical knowledge an¢ 


lest of the Enalich Ia | Ce a 
nglish language, Let us further suppose that he correctly answered 60% ot the items 








In roduction To Meas Te Nw ry 


the English language test but coul ) si 
ack Ob aiGn Bec, foment ex, Soret a Single item in the arithmetic test. On the 
was good? The answer obviously cannot be given wih oe i ive Eng! 'sh language test 
by Mohan may be the percentage for even those students ofthe ae 60% of the iterns answered 
average. On the other hand, the test of English language m; h 1 a naam much below the 
only Mohan and nobody else might have anaweras 60% i the pa a very difficult test and 
measurement is not absolute but rel ative and we cannot draw pe het Thus, the 
measurement of Mohan’s Performance unless it jc Compared with the lets a ina ai 
with other members of the class. Likewise, can We say, on the basis of his = Broup, that is, 
arithmetical knowledge test, that Mohan has no knowledge of arithmetical 2 ae on the 
say so because the zero obtained in arithmetical knowledge test does not x m ve o valle 
arithmetical knowledge. Nothing can be definitely said until a comparison withers — Oo 
of the class is done. All these measurements are, therefore, relative and must be carehilly-a a 
with if measurement is to be meaningful and objective. : ph y dealt 


Errors in Measurement 
Measurement in the physical sciences as well as in the behavioural sciences is most of the time 
REE anh reese ne ante Bu eIEd factors, which produce BFOSs errors, Suppose a weighing 
machine determines a woman’s weight to be 50 kg. This weight might not be her pure weight, 
There may be some minor mechanical troubles in the machine itself so that her weight is inflated: 
femay be that she Has just taken her meal; it may be that she is pregnant; there may be other 
factors present in the physical environment. All these sources of error might inflate or reduce her 
actual weight. Similar sources of error run into psychological, educational and sociological 
measurement. When we are measuring the intelligence of a child with the help of an intelligence 
test, there can be several such factors which tend either to decrease or increase his actual score. 
For example, the child might be nervous; he might have been distracted by the sound of an 
aeroplane; he might not have understood the meaning of the items clearly, and so on. All these 
sources of errors in measurement create problems which adversely affect the scientific value 


of measurement. 


SOURCES OF ERRORS IN MEASUREMENT 


In any ideal research study the measurement should be precise, scientific and unambiguous. But 

this goal is not always reached because of the various sources of errors often encountered. The 

following are the important and possible sources of errors in measurement: 

One important source of errors in measurement is the respondent 
respondent is found reluctant to express his true feelings or it may be that 
the person may not express himself clearly. In either case, 


(i) Respondent: 
himself. Sometimes the 
sometimes due to lack of knowledge, 
the measurement loses its accuracy. 

(ii) Measurer. Sometimes the behaviour, style and looks of the person, who is measuring 
the phenomena, distorts the process of measurement. His behaviour, style and looks may 
“courage or discourage certain types of replies from the respondent that affects the accuracy of 
the measurement. At the data-analysis step, due to incorrect coding, faulty tabulation and 
inappropriate statistical analysis done by the measurer or the researcher, errors may creep into 
the measurement. 

(ii) Situation: Situational factors also contribute to errors in measurement. Any situation 
that puts Unnecessary strain on the respondent, tends to introduce errors in measurement. An 
interview is one example of such a situation. Apart from this, if the person feels that the situation 
does not Protect anonymity, it also introduces error in measurement. 

lV) Test instrument: Errors may be introduced in the measurement due to poor 
Psychometric qualities of the test and defective measuring instrument. Psychological tests having 


Poor reliability and validity may result in measurement errors. 


; { Aetwit t werd Ac nertc es 
aed Keeaanh UevtAasds on Beth 
4 oy daa Areal 
20 Jes. JbfemowerretS 
; the ve-mentioned sources, it is essent;, | 
cord the errors resulting from the above-ment | | = lal thay th 
fo av ewe , bats OF neutralise these possible errors as for as possible. 
researcher should trv to eliminate 


DIFFERENCE AMONG ASSESSMENT, TESTING AND MEASUREMENT 

. 1 ass _ testing and measurement are often contused. Therefore, 'TIS essentiay th 
ieesenmasdandlt ie clearly distinguished. Assessment is a general ic ta me ees any 9 ; 
vanety of procedures used to obtain information about Pome t a : ‘1 sia PAPerang. 

= a m : <—MIses. Derormance of authentic tas SUCH as labora | 
ieee niprither tice andl aalt anor of the student. The basic goal sl assessment jc to 

evaluate a person in terms of current and future functioning, in the Process OF assege 

behaviours are classified into different categories measured against a standard, |j facy 
assessment answers the basic question: How well does the individual pertorm: 
Testing is done through systematic procedure for measuring a sample = behaviour b 
putting a set of questions in a uniform manner. Such systematic procedu fe is “4 led test, Thus 
fests are used in assessment processes. Since tests are a form of assessment, they also ANSWer the 
related question: How wel! does the individual perform either in comparison with others or in 
Comparison with performance of any task? However, not all assessment techniques are tests, |, 

the strict sense, any assessment technique is called a test only when its procedure for 

administration, scoring and interpretation are standardized; there is a standardization sample, 
and there is evidence for its reliability and validity. Many so-called tests can more apPropriately | 
be called as assessment devices because they don’t meet the minimum requirements of atest. | 
Measurement, as discussed in the beginning of the chapter, is only a process of obtaining | 
numencal description of the degree to which an individual possesses a particular characteristic 
This numencal description is done according to some rules. In fact, measurement answers the | 


Sasac Question: How much? 


Review Questions 


1. Make a distinction between measurement and evaluation. 

2. Discuss the different levels of measurement and give appropriate examples. 

3. What do you mean by measurement? Discuss the important functions of measurement jp 
education and psychology. 

3. Distinguish benveen psychological measurement and physical measurement. 

relevant examples. 

5. Discuss the general problems of measurement and give suitable examples. 

&. Explain the nature and characteristics of measurement in psychology. Distinguish 
between ordinal and interval scales of measurement, | 

- What is meant by psychological measurement? Discuss its main requirements. 

8. What is meant by psychological measurement? State the general problems related to 


Give 








Measurement in psychology. 
9. Distinguish between testing and assessment. Discuss the various sources of errors in 
measurement. 
10. = © tho . a i Te : . 
0 Discuss the various Properties of scales that distinguish between the different levels of | 
scales of measurement. | 


7 





2 
LEST CONSTRUCTION 


uate PREVIEW 


Meaning of Test in Psychology and Education 
Classification of Test 
Characteristics of a Good Test 
(i) Objectivity 
Gi) Reliability 
(iii) Validity 
(iv) Norms 
(v) Practicability/Usability 
General Steps of Test Construction 
(j) Planning of the Test 
(ii) Writing Items of the Test 
(iii) Preliminary Administration of the Test 
(iv) Reliability of the Find Test 
(v) Validity of the Find Test 
(vi) Norms of the Final Test 
(vii) Preparation of Manual, and Reproduction of the Test 
Uses and Limitations of Psychological Tests and Testings 


e Ethical Issues in Psychological Testing 


e Steps in Selecting appropriate published Test 


MEANING OF TEST IN PSYCHOLOGY AND EDUCATION 


According to the dictionary, ‘test’ is defined as a series of questions on the basis of which some 
information is sought, In psychology and education, the meaning of test is something more than 
this. A psychological (or an educational) test is a standardized procedure to measure 
quantitatively or qualitatively one or more than one aspect of a trait by means of a sample of 
verbal or nonverbal behaviour. The purpose of a psychological test is twofold. First, it attempts to 
compare the same individual on two or more than two aspects of a trait: and second, two or more 
than two persons may be compared on the same trait. Such a measurement may be either 
quantitative or qualitative. In the words of Bean (1953, 11), atest is “an organized succession of 
stimuli designed to measure quantitatively or to evaluate qualitatively some mental process, trait 
or characteristic”. Likewise Anastasi and Urbina (1997) have defined a psychological test as 
“essentially an objective and standardized measure of sample of behaviour“. Similarly, Cullari 
(1998) has said, “A testis a standardized procedure for sampling behaviour and describing it with 
scores or categories”, Kaplan and Saccuzzo (2001) have opined, “A psychological test or 
educational test is a set of items designed to measure characteristics of human beings that pertain 
to behaviour”. These definitions reveal some important characteristics of a psychological and 
educational test, 


Zl 


| Reset hy Methcuis i ie ey ae ae 
22 fesh Measurements are Resear 


which meats that the stimuli POPUlay 


aa imull, _ | 
ed succession of s! vin sequence and are based upon som 
| 


eal - vo processed through em analysis af 

‘test struction. Usually, the items ol atest ane pr Te Eells te m 

principles of test con istration is standardized to ensure iw procedures remain unifor! 

its procedure ol admit G yme essential for ensuring that mein f | basen mere = ne 

ene ett different situations. The lack ot standardiza of 
for different examiner ent sift 


Ic i \" el , reduce the validity Of Ne 
tt ett sl but also its cuttic ulty level which mas ullimat \ th 
a | ah ‘ 
the charactet Ol 7 


. 1 i } , | > aa hl +, th 4 : , 
| L © " & ‘ LT anki Ww 4 J . 


| itatively measured with the h 
tional tests. The reading ability of a child may be ice ames nc ley : 
a sts. |! \. — ,-er = le 
and enue” y designed for the purpose. His reading-ability score 1 a: Ri 
of a test specially designed Kk erformance of the reading ability of the othe, 
uantitative and qualitative measuremen) | 
al | 


First, test is an organlé 3 
known as items) In the test are 


i oa - > average Pp 
qualitatively measured) with respect to oe acne 
children of his age or class. Thus, a test provides DO q | 

of a trait. cal test is based upon a limited sample of behaviour. What does thig | 
' ‘ f & Lids ™ x 
Third, a i Sapaac rage ant that any psychological test or educational test does not | 
2 ( usly, by this it is mea any ps eee 
mean? Obviously, by this it ts m haviour. Rather it is focussed on the limited effect of that 
assess the totality of a person's behaviour. Ke : 
ene ieee s vocabulary of a person, the test constructor must settle 
behaviour. For example, when testing the vocabulary a ' | : his limi 
; Nto 5 - and predict for that person’s word knowledge from this limited 
for a sample of 40 to 50 words and predict tor PER ) 
ee ets ae ae rson’s word knowledge might be poorer or stronger than 
sample. In reality, the totality of the person's we Pe cca ada ccogetcas call 
the 40- or 50-word vocabulary test. Obviously, then, the implication of the tes ; ple 
concept is that the test score invariably contains some degree of error. Such meme a errors 
can be minimized by means of a careful test design, but it can never be fully eliminated. 

Fourth, psychological tests usually provide scores or categories which are, subsequently, 
interpreted with reference to a standardization sample. The standardization sample should be 
representative of the population for whom the test is meant so that it may be possible to evaluate 
each person's test score or results in comparison to the reference group. For example, knowing 
that a college student scored 120 on a test of abstract reasoning conveys little meaning. But if we 
know that the average score for a college student was 110 and that only 2 per cent of these 
students scored 120 or above, we have a definite basis for making a test prediction, that is we can 
say that the examinee has a good prospect in college. This point, in itself, illustrates that it is not 
the result per se that is valuable—rather what is valuable is the test result that is signified in 
relation to the nontest behaviours, which is of primary interest. 


Some psychological tests are norm-referenced tests which means that results from them are 
interpreted with reference to the average performance of the standardization sample whereas | 
some psychological tests are criterion-referenced, which are, in fact, used to determine where an 
examinee stands with reference to a tightly defined criterion or educational objective. On such 
tests, the comparison is done with an objective standard rather than with the performance of 
other examinees. For example, results of the criterion-referenced arith metic test might state that a 
student does the simple arithmetic works like addition, subtraction, multiplication, division of 
three digits with 80% accuracy whereas the goal of the school system is 90%. Thus, here, the 
perlormance of other students is irrelevant and what is relevant is whether the student meets the 
accepted criterion, 
Finally, for getting the meaning of a psychological test 
mine that we must make a distinction between 
meen — primarily ‘lini ol administering, scoring and interpreting the test 
, On the other hand, is a more comprehensive and wider term that includes the 


entire process of com iline sintheci+ ae 
sates TAS Gas ‘ and synthesizing the information to make a prediction about the 


or educational test clear, it is also 
lesting and assessment. In fact, testing is a 


CLASSIFICATION OF TESTS 


— hologists and educators have taken 
gifferent rileria. A bret intro: tion 


fa If Cal ial rary £4 


Ptitis ey 


| f las 


i lassifie alie 
1, On the basis of the criterion of administra 
fests have meen t lageified into WO Types on the hae 
tests and group tests. Individual tests are those te ‘ 
Kohs Block Design Fest is an example of an inichiviels | Msteredd tc, OME Person at 
school psychologists and counsellors lo Motivate Siler individual les os it 
Some individually administered tests are given orally ides 10 Observe how Ihey resort ‘i 
the examiner. Individual tests, jn general, ‘thd sh WY FeCUite the. ieee? ond, 
time-CONsSUMINE and require the i eb 

tests are used only when a crucial 


“if @ 
YIN teete 
mth 





rear 
ay be Pi eseory, 


Ve Conditions 


the Point af Witeyy 


las follows «| 


i CH itlrr 

; MIS ative e¢ 
“| ; ye Toys F ud 
ISthat are acter ditions individual 
A Vine 
I< AfeG often Lay 


attention of 
NE. SUCH Teste 
Examiners A 


SETVICES Of trained and ES a 
decision is necessary, cae 
Group tests are tests which can be used] among more than one 
Bell Adjustment Inventory is an example of the BTOUp test. goat 
tests are adequate for measuring cognitive skills to survey he rin 
weaknesses of the students in the classroom, etc. > achieve 


2. On the basis of the criterion of scoring 


Scoring is one of the vital parts of a test. Based upon this criteri 
types—objective test and subjective test. Objective tests are th 
competent examiners or observers in such a way that no scope for subjective judeement ¢ 

opinion exists and thus, the scoring remains unambiguous. Tests having tah sche : 
true-false and matching items are usually called obj he Gidblen c 


a jective tests. In such items. the problem as 
well as its answer is given along with the distractor, The problem as 


phe cael problem is known as the stem of the item, 
A distractor answer ts one which is similar to the corr 


| ect answer but is not actually the correct 
one. Such tests are also known as new-type tests or limited-answer tests. 


Subjective tests are tests whose items are scored by the competent examine 
a way in which there exists some scope for sub 
some elements of vagueness and ambiguity 
tests. Such tests are intended to assess an ex 


are 
»SUCN, These 
On OF ina BFOUD at a lime, 
SESSING adjustment, group 
MENS, strengths and 


On, tests are classified 


Inte twe 
OSE WHOSE ite 


™S afe scored ny 


: fs or observers in 
jective judgement and opinion. As a consequence, 
remain in their scoring. These are also called essay 


aminee’s ability to organize a comprehensive answer, 
recall and select Important information, and present the same logically and effectively. Since in 


these tests the examinee is free to write and organize the answer, they are also known as 
free-answer tests. The following items illustrate the nature of an essay test: 
(i) Discuss the role of past experience in perception. 
(ii) What are the major goals of education? To what extent have these goals been achieved in 
India? 
When an examinee answers these questions or other similar items, he usually selects, recalls 
and organizes his experiences in the manner he likes. 


3. On the basis of the criterion of time limit in producing the response | | 
Another way of classifying tests is whether they emphasize time limit or not. On the bate of is 
Criterion, the tests are classified into power tests and speed tests. A power test Is one a . : 
Benerous time limit so that most examinees are able to attempt every item. rally ue i 
have items which are penerally arranged in increasing order of aecag ss “a on 
Intelligence tests and aptitude tests belong to the category ot power tests. In fact, 
demonstrate how much knowledge or information the examinees have. 


are comparatively easy and 


. | -avere time limits but the items | : 
opeed tests are those that have severe time limit very few exartine 


iffi ace of the same degree. Here, 
the difficulties involved therein are more or less of the same deg 





ie Lioiiner were 
‘ a = 
24 Tes 


;, Speed tests, generally, reveal how rapidly, i.e,, with wh, 
ke errors. SP 


: ; : Als 
ina given time limit. Most of the clerical aptitude tact. Peed 
RP jthin.a given ime limi 40 LeSts be| the 
are ue can respond within a § Ong Ig thi 
teaory: , po ig ee | 
very cates whether a test is a power test or a speed test depends, in 
In fact, 


Part, On the na 


; re 

ts might emphasize Of tf 
be a power lest for ¢ 
a pure power test or pure speed test 


hom it is meant. An arithmetical test for class Vil studen 
examinees a nee that were easier for them, but the same test could 
if it ann or for less-prepared students. Today, 85 |) | 
udents € 5 
or IV’ stu ‘the two is common Fare 
sixture of the two Is ; 
rather a mixtu | 


| 


therein 


4, On the basis of the criterion of the nature or contents of items 
Kise may be classified on the basis of the nature of the 
Important types of tests based on this criterion are: 
(i) Verbal test 
(ii) Nonverbal test 
(ili) Performance test 
(iv) Nonlanguage test 


items or the contents used 


i) A verbal test is one whose items emphasize reading, writin 
primary mode of communication. Herein, instructions are printed 
the examinees and, accordingly, items are answered. Jalota Group General Intelligence Test and. 
Mehta Group Test of Intelligence are some common examples. Verbal tests are also cal led Paper 
pencil tests because the examinee has to write on a piece of paper while answering the test items. 

(ii) Nonverbal tests are those that emphasize but don’t altogether eliminate the role of 
language by using symbolic materials like pictures, figures, etc. Such tests use the language in 
instruction but in items, they don’t use language. Test items present the problem with the help of 
figures and symbols. Nonverbal lests are commonly used with y 


oung children as an attempt to 
assess the nonverbal aspects of intelligence such as Spatial perception. Raven Progressive 
Matrices is a good exam 


§ and oral expression a 


S the 
or written. These are read by 











ple of nonverbal test. 
(iii) Performance tests are those that req 


uire the examinees to perform a task rather than | 
answer some questions. Such tests prohibit the use 


of language in items. Occasionally, ral 
language is used to give instruction, or, the instruction may also be given through gesture and | 
pantomime. Different kinds of performance tests are available. Some tests require examinees to 
assemble a puzzle, place pictures in a correct sequence, place pages in the boards as rapidly as 
possible, point to a missing part of the picture, etc. One feature of performance tests is that they | 
are usually administered individually so that the examiner can count the errors committed by the 
examinee or the student and can assess how long it takes him to complete the given task. 
Whatever may be the types of performance test, the common feature of all performance tests is 
their emphasis on the examinee’s ability to perform a task rather than answer some questions. 
liv) Nonlanguage tests are those which don’t depend upon any form of written, spoken or 
reading communication. Such tests remain completely independent of the ability to use language | 
in any way. Instructions are usually given through gestures or pantomime and the examinees 
respond by pointing at or manipulating obj 


ects such as pictures, blocks, puzzles, etc. Such tests 
are usually administered to those persons or children who can’t communicate in any form of 
ordinary language. 


5. On the basis of the Criteri 


Tests are also Classified in terms of their ob 


. jectives or purposes. Based upon this criterion, tests are 
usually classified as intelligence tests, ap 
and achievement tests 


titude tests, personality tests, neuropsychological tests 
ee - Intelligence tests j 
tests assess potentials 


ntend to assess intelligence of the examinees. Aptitude 
OF aptitudes of the persons. Personality tests assess traits, adjustments, 


4 


on of purpose or objective 


EE 


interests, values, elc., of the Persons. Neurop: 
jssessment OF persons with known or 
as! 


SUS 
what the persons have Acquired jr) the viv 


6. On the basis of the criterion of sta 


lest “Onstra, tion 25 
ye holagie Al teste 

eC ter| by asl 
Pecter Drain | 


en ores alt 


the lests, which 


yet Afr rl . 
"7 1 a a i 


7 Tunes lane 


EN teste 

4 some. Wraining or le sce “ _— 
= Ndardization | . 
Tests are also classified on the basis of Slandardizatior p 
classified into standardized tests and teacher-made tecte 


UpOn this die 
ardized tests are those which Nave 
However, the ” 


Mean Ing of the term 
Conditions: 

there muy be 4 
Maintained jn the evaluation of | 
(ii) The second condition for standardization js that 


there 
an index of fairness of correct answer through the proc wiiciecuis 
(iii) The third condition is that 


si item analysis should be available. 
relabdility and validity of the lest must he estahl; 
wi \ “St Must be established 
individuals for whom the test is intended should be explicitly mentioned. ee 
(iv) The fourth condition, a controversial on 


€, is that a standar 
However, according to Cronbach (1970, 27), a tect even 
standardized test. But the majority of Psychologists favour the 
have norms as well. 


| ade tests, Stand 
been subjected to the procedure of Standardization 
standardization’ is controversial and 


includes at least the 
(i) The first condition for standardization js that 


instructions so that uniformity can be 


CNIETION, teste 


tol lowing 


Handard manner oj giving 
all thase Who lake the tect. 
must be Uniformity of scoring and 


dized test should have norms. 
without norms May be called a 
idea that a standa rdized test should 
By way of summarizing the meaning of a standardized test, it can be said that standardized 
tests, constructed by test specialists, are Standardized in the sense that they have been 
administered and scored under stand 


ard and uniform testing conditions so 


that the results 
obtained from different samples may legitimately be compared. Items of 


standardized tests are 
fixed and not modifiable. 


Teacher-made tests are those that are constructed by teachers for use largely within their 
classrooms. The effectiveness of such tests depends upon the skill of the teacher and his 
knowledge of test construction. Items may come trom any area of curriculum and they may be 
modified according to the will of the teacher, Rules for administration and scoring are determined 
by the teacher. Such tests are largely evaluated by the teachers themselves and no particular 
norms are provided; however, they may be developed by the teacher for his own class, 


Thus, we find that tests have been classified in terms of Various criteria. These tests are used 
for a variety of purposes. 


CHARACTERISTICS OF A GOOD TEST 


For a test to be scientifically sound, it must possess the following characteristics. 
Objectivity 


A test must have the trait of objectivity, i.e., it must be free from ee ae “ 
there js complete interpersonal agreement among experts + oes ' . poe a 
scoring of the test. Obviously, objectivity here relates to two ni Aa i i 
items and objectivity of the scoring system. By oujectivity avi same way by all those who 
be phrased in such a manner that they are interpreted eh » viii ity of order of presentation 
take the test. For ensuring objectivity of items, items sat sen scoring is meant that the scoring 
ame sii i: mince vane can be maintained whet 
Method of the test should be a standar one so that c nes 

the test is scored by different experts at different times. 


26 Tests, Measurements and Research Methods in Bepart niral Sciences 


Reliability 

Atest must also be reliable. Reliability here refers to self-correlation of the test. |t show 

to which the results obtained are consistent when the test is administered once OF mor s tay, 
et 





Test Construction 27 


planning of the Test 


et step | Construct J tie caratil mon: . 
a oo oust aighaad . fonsoname git Sosa ms results obtained in : a Thee te ae pedi objccine. che - eons, bai stage, the test constructor 
administration is the index of internal consistency of the test and consistency in results sing) ne ap sive te BETA e — terms. He decides upon the nature 
upon testing and retesting is an index of temporal consistency, Reliability, thus, includ dingy ot wie sc dats eel arranmenere (sr tie jelitheveienins to be included, the method ot 
internal consistency as well as temporal consistency. For a test to be called sound, it ®S bog, a Lanpth-and time:limit fos the'sompletian ofthe nate wa vi the final administration, 
reliable because reliability indicates the extent to which the scores obtained in the st MUSt fy a PP otc, Plahiiing alsoinieliidesthe tote i een able eae methods to be 
from such internal detects of standardization which are likely to produce errors of ea ane free a reparation arate. ons of the test to be made and 
SUremen | 


Validity 

Validity is another prerequisite for a test to be sound. Validity indicates the extent to whi 

test measures what it intends to measure, when compared with some outside las ich t | 

criterion. In other words, it is the correlation of the test with some outside criterion. The ies 

should be an independent one and should be regarded as the best index of trait or abili his 

measured by the test. Generally, validity of the test is dependent upon the reliability Pili, 
‘USE 


test which yields inconsistent results (poor reliability) is ordinarily not expected to correlate w; 
some outside independent criterion. i 


writing of the Test Items 


The second step in test construction is the preparation of the items of the test. According to Bean 
(1953, 15), an item Is defined as “a single question or task that is not often broken down into any 
smaller units.” Item writing starts with the planning done earlier. If the test constructor decides to 
prepare an essay test, the essay ReMs are written down. However, if he decides to construct an 
objective test, he writes down the objective items such as the alternative reponse item, matching 
item, multiple-choice Item, completion item, short-answer item, pictorial form of item, etc. 
Depending upon the purpose, he decides to write any of these objective types of items. 

item writing is essentially a creative art. There are no set rules to guide and guarantee writing 
of good items. A lot depends upon the item writer's intuition, imagination, experience, practice 
and ingenuity. However, there are some essential prerequisites, which must be met if the item 
writer wants to write good and appropriate items. These requirements are enumerated as follows: 

1. The item writer must have a thorough knowledge and complete mastery of the subject 
matter. In other words, he must be fully acquainted with all facts, principles, misconceptions, 
fallacies in a particular field so that he may be able to write good and appropriate items. 

2. The item writer must be fully aware of those persons for whom the test is meant. He must 
be aware of the intelligence level of those persons so that he may manipulate the difficulty level 
of the items for proper adjustment with their ability level. He must also be able to avoid irrelevant 
clues to the correct responses. 


Norms 


A test must also be guided by certain norms. Norms refer to the average performance oj 

representative sample on a given test. There are four common types of norms—age norms, ora | 
norms, percentile norms and standard score norms. Depending upon the purpose and use, a tes | 
constructor prepares any of these norms for his test. Norms help in interpretation of the scores, |p | 
the absence of norms, no meaning can be added to the score obtained on the test | 


Practicability / Usability | 
A test must also be practicable/usable from the point of view of the time taken in its completion, 
length, scoring, etc. In other words, the test should not be lengthy and the scoring method mus 3 a ee gs — 
not be difficult nor one which can only be done by highly specialized persons. In addition, the | 3... The iter writer must be familiar with different Epes of pees along with their advantages 
test should be economical from the point of view of money also. and disadvantages. He must also be aware of the characteristics of good items and the common 
| | probable errors in writing items. 
4, The item writer must have a large vocabulary. He must know the different meanings of a 
| word so that confusion in writing the items may be avoided. He must be able to convey the 
meaning of the items in the simplest possible language. 
5. After writing down the items, they must be submitted to a group of subject experts for 
their criticisms and suggestions, following which the items must then be duly moditied. 
The item writer must also cultivate a rich source of ideas for items. This is because ideas are 
Not produced in the mind automatically, rather, they require certain factors or stimuli, The 
common source of such factors are textbooks, journals, discussions, questions for interview, 
course outlines, and other instructional materials. After the items have been written down, they 
are reviewed by some experts or by the item writer himself and then arranged in the “ 
which they are to appear in the final test. Generally, items are arranged in an increasing or - " 
difficulty, and those having the same form (say, alternative form, matching, multiple-choice, etc. 
and dealing with the same contents are placed together. 


GENERAL STEPS OF TEST CONSTRUCTION 
Before the real work of test construction begins, certain broad decisions are taken by the 
investigator. These preliminary decisions have far-reaching consequences. It is at this preliminary 
stage that the test constructor outlines the major objectives of the test in general terms, and 
specifies the populations for whom the test is intended. He also indicates the possible conditions 
under which the test can be used and its important uses. For example, a test constructor may 
decide to construct an intelligence test meant for students of the tenth grade broadly aiming at 
diagnosing the manipulative and organizational ability of the pupils. Having decided the above 
preliminary things, he must go ahead with the following steps. | 

1. Planning of the test 

2. Writing items of the test 

3. Preliminary administration (or the experimental try-out) of the test 

4. Reliability of the final test 

5. Validity of the final test 

6. Norms of the final test 


7. Preparation of manual and reproduction of the test 


Preliminary Administration (or the Experimental Try-out) of the Test 


When the items have been written down and modified in the light hes sen <n 
ts - - : = : © - pr nr a i 1 in : f 
“riticisms given by the experts, the test is said to be ready for its experime ry ae ee 


inistratic is manifold, Accor 
of €xperimenta! try-out or preliminary administration of the test is man ifold. A 


28 Tests, Measurements and Research Methods in Behartoundl Scenees 

(1951), the main purpose of the experimental try-oul of any psychological and educational Lest jg 

as given below; tne and inadequacies of ; 
1. Finding out the major weaknesses, omissions, ambiguities pdocemingc OF th 

items. In other words, try-out helps in identifying the ambiguous wie : items andthe” 

nonfunctioning distractors in multiple-choice items, very difficult or very eae” | sola © like, 
2. Determining the difficulty values ot each item which, in turn, helps in selecting items {,, 

their even and proper distribution in the final form. 


3. Determining the validity of each individu | pees eae 
ae ee a ee aPeeeh Individual item. The discriminatory power her, 
determining the discriminatory power of eacn | 


ice - enccesefully between tho 
refers to the extent to which any given item discrmanes ee a least peste mg 
ss ‘ wig : 1] od. = 
4. Determining a reasonable time limit of the test. 


5: Determining the appropriate le oft 
number of items to be included in the final form. 


al item, The experimental try-out helps j, 


s so that overlapping can be avoided. 


ngth of the test. In other words, it helps in determining the | 





| Test Construct 

| | d tan ZY 

seit and the Rulan formula are also used in cComputi 
ae | Ting 

adetailed discussion of these methods appears in ( h i 


validity of the Final Test 

validity refers to what the test measures and how well i 7” 

antends (0 measure well, we say that the test is a valid one Ant 

-oefficie nt of the test, the test constructor validates the test ss fe oe the reliability 

C natia BY comparing the lest with the criteria, Thus, validity st ep gn pga 

correlation of the test with some outside independent criteria. Validity shout . defined as the 

the data obtained fom the Samples other than those used in itern analysis gin ie 

known as cfoss ie sian ae are three main types of validity: cont ent baat pets & 

validity and criterion-related validity. The usual statistical techniques em jlo wie a 

validity coefficients are Pearsonian f, biserial r, pointbiserial r. chi eats ane ee 

The abac tables have alse wn prepared by Flanagan for directly reading the values ee aie 
intbiserial rand phi-coefficient when the proportions of those passing an item i Nay : 

d upper group are known. (A detailed discussion regarding these types of alidievnna 


the reljabilip e Fes “ 
apter 5. AY COBMICLENTt of the test. 


tsures, If a test measures a trait that it 


roup an 


6. Determining the intercorrelations of item 
statistical tec 


nd vagueness | 
tions of the test. 


7. Identifying any weakness a n directions or instructions of the test as well ag hniques employed appears in Chapter 6.) 
7. lae y weaknes: 


in the fore-exercises or sample ques | 
For achieving these aims of experimental try-Out Conrad (1951) recommended at least three 


preliminary administrations of the test. The aim of the first ac oe : a ae a rss 
defects, ambiguities, and omissions in items and instructions. For the first administration, the 
number of examinees should not be less than 100, He refers to the first try-out as the 

“Pre-try-out”. The aim of the second preliminary administration is to provide data for item 

analysis, and for this the number of examinees should be around 400. Conrad calls this second 

try-out “the try-out proper”. The sample for this must be similar to those for whom the test is 

intended, Item analysis is a technique of selecting discriminating items for the final Composition | 
of the test. It aims at obtaining three kinds of information regarding the items: (i) the difficulty 
value of the item, (ii) the discrimination index of the item, and (iii) the effectiveness of distractors, 
(A detailed discussion of item analysis would appear in Chapter 4.) The third preliminary 
administration is carried out to detect any minor defect that may not have been detected by the 
first two preliminary administrations. Conrad calls this third try-out the “final trial administration”, 
At this stage, the items are selected after item analysis and they constitute the test in the final form, 
The “final trial administration” indicates how effective the test will really be when it would be 
administered on the sample for which it is really intended. Thus the third preliminary 
administration would be a kind of ‘dress rehearsal’ providing a sort of final check on the 
procedure of administration of the test and its time limit. After the final trial administration is over, 

no material change is ordinarily to be included in the test. 

Although the procedures recommended by Conrad for preliminary administrations have 
been widely appreciated, it has not been followed as a fixed rule. As a matter of fact, the number 
of preliminary administrations and the number of the examinees for each administration have 
widely varied depending upon the nature of the test as well as upon the purpose of the test. 


Norms of the Final Test 
Finally, the test constructor also prepares norms of the test. Norms are defined as the average 
formance or score of a large sample representative of a specified population. Norms are 
repared to meaningfully interpret the scores obtained on the test for, as we know, the obtained 
scores on the test themselves convey no meaning regarding the ability or trait being measured. 
But when these are compared with the norms, a meaningful inference can immediately be 
drawn. The common types of norms are the age norms, the grade norms, the percentile norms, 
and the standard score norms. All these types of norms are not suited to all types of tests. Keeping 
in view the purpose and type of test, the test constructor develops a suitable norm for the test. The 
reliminary considerations in developing norms are that the sample must be representative of the 
true population; it must be randomly selected; and it should preferably represent a cross-section 
of the population. (A detailed discussion of the different types of norms appears in Chapter 7.) 


Preparation of Manual and Reproduction of the Test 

The last step in test construction is the preparation of a manual of the test. In the manual the test 
constructor reports the psychometric properties of the test, norms and references. This gives a 
clear indication regarding the procedures of the test administration, the scoring methods and 
time limits, if any, of the test. It also includes instructions as well as the details of arrangement of 
materials, that is, whether items have been arranged in random order or in any other order. In 


manual should yield information about the standardization sample, reliability, 


general, the test 
alter seeing the 


validity, scoring as well as practical considerations, The test constructor, 
importance and requirement of the test, finally orders for printing of the test and the manual. 


| USES AND LIMITATIONS OF PSYCHOLOGICAL TESTS AND TESTINGS 


Psychological tests are widely used for many purposes. It Is very convenient to distinguish the 


Reliability of the Final Test daa 
When on the basis of the experimental or empirical try-out the test is finally composed of the following five uses of tests: 
selected items, the final test is again administered on a fresh sample in order to compute the 
reliability coefficient. The size of the sample for this purpose should not be less than 100. 
Reliability is the self-correlation of the test and it indicates the consistency of the scores in the test. 
There are three common ways of calculating reliability coefficient, namely test-retest method, 
split-half method, and the equivalent-form method. Besides these, the Kuder Richardson 


n making classification of 
to another one. There are 
ose in assigning 
rtification and 


|. In classification: Psychological tests are popularly used i 
persons, that is, for assigning the persons to one category rather than 
different types of classification, each one giving emphasis upon a particular eh 
persons to categories. Important types of categories are placement, <a oe 
selection, where psychological tests play a significant role in each of these types. 


a 


aaa SCHOHOES 
rye on nebarwoural ‘ 
a rammes according to the}, 


Tati 7 Ke 


samemenits @ sriate prog 
sere Aperer rene nproprale | atten in clace : 
i ’ i F i al ‘ e 5 
ars to the SOMME psychologi io social science faculty, The 


appropriae 
facully and 
ts for mathematics physi« : 
, history and politics | 
ble to do such Pp | 
es ot identification of 


Placement rele 
with the 
“nts into s 


same Ol them (0 
yemisity PLOBl 
programine 


help of the 


: yme whereas others ar 
cence 


IIs. 
needs ii kil Gece " 
he stucte Without the help of 
wden 
1c vil aphy, i 
tis nol poss! 


and cl 
| scrence 
acement. 
persons w 


some oll 
enroll some st | 
tor ith special characteristics g, 
tricians often screen persons into Creative 
psychomelrl sasoning. They administer the test. 
SS in desired categories. 
ot sychological tests. Certification implies 
io me discipline oF activity. When a person 
fers some privileges. For example, when j 
This illustrates the process of certification, 
fers some privileges on the part of 
lished with the help of 
ores, for example, pet 


enrolled : 
al tests. | 
rs to the procedur tt 
ological tests, , 
al talent 1 | 
able to screen them 


psycholog 
Screening rete 
With the he 
qd persons 
hasis of the 
selection are 7 
least a minimum pre" ei 
passes a certification examination, © 5 are 
driver passes 4 driving exaTTEnNenN Te non because it also con 
Selection is very much similar to ian These tage ate well accomp 
the persons who have Deen “i me selected on the basis of the test se 
psychological tests. The persons who snloyment in an organization. 7 
admission into a certain course oF ga1n employ . Psychological tests play a significant role jp 
2, In diagnosis and planning for treatment: ay: a etermining the nature of 
making diagnosis and in planning for Kael aie pattern within an accepted system, 
person's abnormal beeen ane ease ae snosis of mentally retarded children. Likewise, 
intelligence tests are considered important ot jearin disability can easily be done with the 
hrough some psychological es cine Giognoses persons with pathological traits. A proper 
help of MMFI, a clinica’ psy nae! co tatval -¢ the choice for treatment, 
: . not only provides assignment of a label, but also the 
Naess ais mes as one circ or having learning disability, a planning for his 
treatment is accordingly done so that the maximum help can be rendered. _ 

3. In self-knowledge: Psychological tests are also useful in providing sel ini to 
the test takers to the extent that such knowledge tends to change their career pat 1 Every 
administration of a psychological test gives a feedback to the test takers rega rding the level of 
trait/ability being assessed. As a consequence, they bring a change in the desired direction and 
mould their path for betterment. 

4. In evaluation of programmes: Psychological tests are oiten used in evaluation of 
various types of educational and social programmes. In schools and colleges, different types of 
programmes for betterment of academic achievement are carried out and the persons want to 
know about its impact. Such impacts are easily assessed with the help of various types of 
achievement and intelligence tests, Likewise, people in general and political parties in particular 
want to assess the oulcome of a social programme carried out for the purpose of say, verifying the IQ 
levels of disadvantaged proup. This is also done with the help of various types of psychological tests. _ 

». In theoretical and applied branches of behavioural research: Psychological tests are 
very useful inresearch, They are frequently used in both theoretical and applied researches. With 
the Help of such tests, psychologists frequently investigate theoretical matters that have no 
vo or arcing peopl ee ae ie ca cle the example of Witkin (194 
(TRIC), In fact, TRIC oni sa eda ae t eveloped ine tilting-room-tilting-chair tes 
seldom applied to any mae : Jee deal of research on personality development but was 

Practical problems of testing, Take the example for an applied field. 


Suppose neuropsychologists yw; 
OBISIS Wish to te n site ; va 
produces behavioural defici in children, the hypothesis that low level of lead absorption 


lead-burdened ¢ hildren and normal childre 


Ip of psych a 
having excepuion 
<cores obtained, are 
done with the hel 
oficiency in SO 


needs. 
persons an 
and on the 

Certinication and 
that an individual has al 


n with the help of psychological tests. With the help 


This hypothesis can be easily tested by examining — 





of various types of psychological lests, it (est Construction 3) 


wast has 
pildren pr muces decrement 
i 


: rer In IQ, ITP ar inerye 
“jassroom behaviour, This autornatic ally sho 
ree 


as too and there should not be any debate 


Thus, we see that psychological tests are Used | 

curious to know from where to obtain information stan 

can be said that information about Psychological tate _Pavchologic 

Catalogues, Reference books, Journals, Database ; 

source of information is Mental Measurements 

again to incorporale the new tests. Information can also he Hdd from th 

publishers. Appendix C provides the names of foreign and tricia ; € Catalogues of major test 

about tests can also be had from various journals, Appendix D eee niet * 
7 ISt OF Important 


heen pe 

He "reported that low-leye 
NU IN reaction lire 
5 that psye hologic 
about the valid 


“ ‘ lead absorption in 

INCTEASE jn undesirable 
tul in applied 
search findings, 
| Often students remain 
rhe ‘a a0 In Particular, here. it 
av: = TOM five sources: eho 

cain , Ources: Publisher’s 
rs! Manuals, The best single reference 


Year hook | ich | 
yok (M 5 feviewed time and 


| al teste are Wse 
ily of lesting-based re 
Varieties 6g fields 


are 


nals in the field of behavioural researches. 
Despite its various uses, psychological 
important limitations are as under: 

(i) Psychological tests represent an 
‘avasion of privacy if they are used without 
sensitive information. 


ii) Psychological tests permanently calegorize the persons: On the bas; 
performance of psychological tests, the testees or examinees, are “ pele as : = 
mentally retarded, gifted, brain-damaged, etc., and the authority ‘mat ee like 
disregarding evidence of any further change. This has a serious implication for th cones 
The examinees can definitely change and great care should be taken in the inter examinees. 
of the test results. ial it 

(iii) Psychological tests measure only limited and benefi 
that psychological tests cannot measure the most im 
examinees to take decisions based on superficial 


jou r 


lests are not without limitations. Some of the 


| 4 ae 7 
— of Privacy. Psychological tests may be 
ve permission of the testees to obtain personal and 


cial aspects of behaviour. It is said 
portant human traits. They force the 
and relatively unimportant criteria. 

(iv) Psychological tests create anxiety: Generally, it has been reported that 
assessment is to be done through psychological tests, the examinees feel anxious and this anxiety 
affects their performances. However, the examinees who are familiar with specific types of tests 
are less anxious than those who are familiar with the test contents (Sax, 1974). 


when the 


(v) Psychological tests penalize bright and creative examinees: Psychological tests are 
insensitive to atypical and creative responses. Such responses are not given much credit, thus 
discriminating against the talented examinees. 


Thus psychological tests have some limitations. 


ETHICAL ISSUES IN PSYCHOLOGICAL TESTING 


As we know, psycho! ogical testing refers to all the possible uses, applications and underlying 
important concepts of psychological and educational tests. To maintain its proper uses and 
applications, the American Psychological Association (APA) officially adopted a set of standards 
and rules in 1953 which have undergone continual review and refinement. The current version, 
called Ethical Principles of Psychologists and Code of conduct (APA 1992), consists of a 
preamble and six general principles which guide psychologists towards the highest ideals in their 
profession. In addition, it also provides eight ethical standards with enforceable rules for 
psychologists who are working in different contexts. The APA Committee on Psychological Tests 
and Assessment (CPTA) is especially designed for considering problems regarding sound testing 
and assessment practices and for providing various types ot technical advices. 


ing to psychological testing can be dec-.: 
ale sfating to psychologica €scribeg | 
Lcevie' ical or moral issues re Und. 
The major ethica 
, ing five headings: 7 : ae 
sete we saci rights: Today the field of  omenenen ts has been Fay, | 
; * idl | 5 " ; 
1. Issues o nition of various types of human rights. mone 'g! Sis the right to il 
dyspy ‘ a persons who don’t want to subject themselves to testing, Noy 
to be tested. In tact, s wh 


bieall 't be forced to accept this. Moreover, individuals who linally decide to in 
ethically can itt 


w their test scores, their interpretations a< We s ea 
en a " mone ube a hes In the name of guarding the security of < a 
basis of any enone rive the test takers (or subjects) from the right to know the basie | 
(how tines esl — Likewise, these days other human rights such as the i hte 
ewe will have access to the data of psychological aie and ve ight ~ confidentiality i 
test results are being popularly discussed. Test interpreters ave an et prt eer tO proyj 
protection to these human rights whereas potential test takers are responsible for demandi 
rights. Such awareness of human rights today is casting a very important influence 
psychological testing and also shaping its future. | | _ 

2. Issue of labelling: On the basis of psychological testing, a person is given a Certaj 
label or diagnosed as having a certain psychiatric disorder. This labeling has many harm 
effects. For example, suppose a person has been diagnosed with 


chronic schizophrenia, which in 
fact, has a little chance of being cured. In fact, labelling someone a chronic schizophrenic 3 
be a self-fulfilling prophecy. Since this disorder is incurable. nothing can be done and when 


nothing can be done why should one bother to provide help to such a person, Because No help ig 
given, the person remains a chronic case. Thus, labelling can stigmatize a person for life and it 
also affects one’s access to help. Such a labelling creates additional problems. When a person jg 
labelled as schizophrenic, it automatically implies that he is not responsible for it because 
schizophrenia is a disease or illness and nobody can be blamed for becoming ill. Therefore, such 
labelling will make him passive and will leave no incentive to alter the negative conditions 
surrounding his life. Therefore, labelling will not only stigmatize persons but it will also lower 
tolerance for stress and make treatment difficult. In view of these potential negative effects and 
dangers of labelling, a person should have the right to not be labelled, 

3. Issues of invasion of privacy: When people respond to items of psychological tests, 
they have little idea of what is being revealed by their responses but somehow, they feel that their 
privacy has been invaded. Such a feeling is definitely detrimental for people, 

In fact, there are two sides to this issue. Dahlstrom (1 969) investigated this issue in detail and 
pointed out two related aspects of this issue. He has pointed out that this issue of invasion of 
privacy is based on serious misunderstandings. In fact, psychological tests have very limited and 
pinpointed aim and they can’t invade the privacy of the persons. Another aspect of this issue, 
again pointed out by Dahlstrom (1969), is the ambiguous nature of the notion of invasion of 
privacy itself. In reality psychologists don’t consider it wrong, evil or even detrimental to find out 
or collect information about the person. The person’s privacy is invaded when most information 
is used inappropriately or wrongly. Psychologists are ethically and even legally bound to 
maintain confidentiality and they don’t reveal any more information about a person than is 
necessary to accomplish the purpose for which the testing was started. In fact, the ethical code of 
APA (1992) has included contidentiality, which obviously dictates that personal information 
obtained by the Psychologist from any source is communicated to others only with the person's 
Consent. Exception to this exists in only those circumstances in wh ich withholding information 
may Cause danger to the person or saciely as well as cases that have subpoenaed records. 


4. Issues of divided Io is one of the vital issues of ps 













yalties: This 
Was tirst pointed out b 


the field of i y Jackson and Messick (1967) and stil| today remains a central problem in 
Psychological testing. | cpt. ohtce yess gs a rte i aif 
psychologists who ne ah B. in fact, divided loyalties is today a major dilemma fo 


military, and s0 on. est In different fields such as industry, schools, clinics, government, 

| face a conflict, which arises when the individual's 
hat of the institution that employs the psychologist on 
hologist Working for an industrial firm to identify 


ychological testing and 


| - A psychologist has to 
welfare is pu € one hand and t 
example, suppose a psyc 





a 


individuals who are accident prone, has the responsibil; 

ersons as well as the responsibility to protect ie ilit 
employment. Here, the Psychologist’s loyality ‘avid 
maintain test security at any cost but he 
of an adverse decision, However, if this basic iS explain 
to the persons with the same Problem, | 
psychologist is trapped with two Opposi 
| 5. Responsibility of test Constructors and tes 
responsibilities on test constructors, or developers, 
responsible for providing all the nec 


Test Construction 43 
| pom the institution to identify such 
BMS and welfare of the persons seeking 
psychologist has to 


{ users: Ethical issues also put some 


and test users. In fact, the test constructor is 


Essary information Latest standarc 

_ | y Le andards for test use state th 

test constructors must provide a test manual which May Clearly state the appropriate ie 7 ‘s 
e 


test, including data relating to reliability, validity, and norms Clearly specify abo ‘a 
aces : . ® 4 slat | u 
and administration standards (AERA’, APA & NCME’ 1999) ra Sree seaeng 

There are some responsibilities which | 
almost any test can be useful if it is used in th 
the subject if it is used inappropriately, To 
uses of the tests responsible for knowing th e test, as well as the consequences 
of using the test. It also makes test users to maximize the test's effectiveness and to minimize 
unfairness, if any. In other words, test “USErs Must possess sufficient and adequate knowledge to 
understand the basic principles u nderlying the test construction and supporting research of any 
test they administer. They must also be aware of the psychometric qualities of the test being used 
as well as the relevant literature. At any cost, a test user cannot claim ignorance. The test u 
responsible for finding out relevant and pertinent information using any test. 

These ethical issues, in fact, 


put psychologists on vigil so that they must guard both the 
interest of the test as well as the welfare of the test takers. 


STEPS IN SELECTING APPROPRIATE PUBLISHED TESTS 
As we know, published tests play a 


ie with test users too, According to APA (1 974), 
€ right circumstances but even the best test can hurt 


minimize such potential damage, APA (1974) makes 


€ reason for using th 


SET is 


significant role in research as well as 
programme of the school. Therefore, they need to be selected with utmost ca 
tests are carefully selected and appropriately used, they make valuable 
research outcomes. On the other hand, when they are selected hastily or casually, they seldom 
make any significant contribution because they provide inadequate or inappropriate 
information. The following sequence of steps must be followed in selecting a published test: 

|. Defining the needs of testing: The first step in selecting published tests is to define 
specilically the purpose of testing and the type of information being sought through testing. The 
purpose of the testing must be clearly defined and the test should match with the purpose. In any 
given content area, there are numerous tests and each measures somewhat different aspects of 
knowledge and skill. To make a proper and appropriate solution, the researcher must first identify 
the objectives and specific outcomes of the research programme. This is, in fact, necessary in 


choosing relevant tests, whether selecting a single test or a battery of tests for research or testing 
programmes, 


“Bi: Narrowing the choice: The need for published tests should not be considered in 
'solation, rather in relation to the total measurement programme. In fact, this makes it ma a: 
choose tests that su pplement and complement the other means of oe (eee 7 ie 
scholastic aptitude test is to be used for diagnosing students needing oe ee shag oh bs 
read, it is desirable to replace the aptitude test with a diagnostic reading hi | : o heweoct 
actors that help narrow the choice. For example, if the test user (or ee er a oo 
Experience to administer individual test, only the group tests should b ie Hike can mh 
the tests are to be administered by such teachers who lack experience in tes 


in any educational 
re. When published 
contribution to the 


— 
* AERA stands 


American Educational Research Association and NCME stands for National Council on Measurement in 
or American Educa 
Education, 





his Ls ha _ a . wn a 
iate choice. Similarly, if the 
appropriate Sat 
id be the 
ng shou 


: a 

igh school levels, only thoca ... LY 

$ oe both elementary and high scioo" eves, on y tose achieve | 
siren ade levels should be given due consideration, io 

all pra — 


F anh 
etre " 

z wits, Me 
4 18 


ty directlO 
with simple set 15 de : 
itable TO" ate fests: When the researchers have req 
esr table and appropriate tests: peated aiid @iarniac (ee ae 
» pocating sil sber, specimen sets should be obtained and examined so 
: 1 able nun . ; 4 ; = : Pa 
bith "s the items of the test may be evalu ated property 
manuals 45 _ ia test materials: A test manual provides several types of Information | 
4. Reve yopriateness and related technical qualities of the test. A good test manua| ame | 
A é (hed I : = ring ah ae 1 r h 
eile ‘de the following types of intormation: Q | 
oinelr ea . | : 
(i) Major uses for which test is recommended _ | 
(ji), Qualifications essential for administering and interpreting the test 


ene ae | 
(iii) Evidence tor reliability of recommended uses 


ye u 
, re 
tests; ¥ spich ate - 


tests [0 4 


a lis | 
that th, Noy | 





| | 
(iv) Evidence for validity of recommended uses 


(vi) Adequate norms for interpreting the scores. | 


Besides reviewing the test manuals, it is also wise to study carefully the individual test iterys | 
and the best method of doing this is to try to answer each item as if you were taking the teg. | 
Although this is a time-consuming process, there is no other more better means of determining | 
how appropriate a test is for measuring the knowledge or ski \. | 

5. Using a test evaluation form: At the last step, the researcher should use a tag | 
evaluation form, which makes it easier to gather information about a specific test. In fact, test | 
evaluation form guarantees that the pertinent information will not be overlooked. Such form also | 


provides for a summary comparison of each test’s strengths and weaknesses. A specimen of teg | 
evaluation form is presented below. 


(y) Directions for administering and scoring the test 


Test Evaluation Form 









Title of Test: 
Publisher: 


Publication date: 





Author(s): 


Purpose of Test: 







Forms: 
Types of scoring: Administration time: 
Technical Features: 


e Reliability: Nature of evidence (internal 
consistency, external consistency, standard 
error of measurement, etc.) 






Validity: Nature of evidence 





e Norms: Type and appropriateness to local 
| situation 





Practical Features General Evaluation 





e Ease of scoring and interpretation e Summary of strengths 


e Ease of administration (Procedure and timing) e Summary of weakness 






e Adequacy of test manual and related materials © « Comments of reviewers 


e Recommendation regarding local use 





_— 


3, Discuss the important 


g. Discuss the important ethical issues involved in psycholo 


Fest Construction 35 


Give the meaning of a test 
psychological test, 


, Wl Rey; : 
IN psychok BY. Poin Eview Questions 


GUL the mai 
i the major characteristics of 4 good 
Make a distinction between | 


4 teacher-made te 

, est and a st- 

Outline a plan for classifying a psychological - 4 Standardized test. 
: : Cal an 

Describe, with examples, the Beneral steps in 

What is a psychological test? De | 

psychological test. 


an educational test. 

CONSUUCtiON of 

mh he NSMUCtiON of a Psychological test. 
' The essential characteristics of 
Discuss the nature of standardization 6 
consideration for the standardization 6 


4 good 


E a test. Explain the 
f a test, 


uses and limitations of psychological tests. 


different aspects taken inte 
gical testing. 


Discuss the major steps in selecting appropriate published test 


am 





2 
ITEM WRITING 





CHAPTER PREVIEW 
e Meaning and Types of Items 
() Essay Item 

Ci) Objective Item 
° : difference between Essay-Type Tests and Objective-Type Tests | 
General Guidelines for Item Writing 
© General Methods of Scoring Objective-Test Items 

() Overlay-key Method 


GD Strip-key Method 
MEANING AND TYPES OF ITEMS 


For d test to be sound, the items must be good. Bean (1953,15) defines item 
question that usually Cannot be broken down into any smaller units”. An arithmetical pre 

may be an ilem; a manipulative task may be an item; a mechanical puzzle may be an ona 
likewise, sleeplessness may also be an item of a test, It is in this sense that Nunnall is 
referred to items as the ‘lowest common denominator of a test’ , Which is scored. “| 


An item must have the following characteristics: | 


ae f i | 
as “a single task o, 


|. An item should be phrased in such a manner that there is no ambiguity regarding Ps 
meaning for both the item writer as well as the examinees who take the test. 

2. The item should not be too easy or too difficult. | 
3. It should have discriminating power, that is, it must clearly distinguish between those | 
who possess a trait and those who do not. | 
4. It should not be concerned with the trivial aspects of the subject matter, that is, it mug 

only measure the significant aspects of knowledge or understanding. 
9. As far as possible, it should not encourage puesswork by the subject. | 
6. It should not present any difficulty in reading. | 
7. It should not be such that its meaning is dependent upon another item and/or it can be 
answered by referring to another item. | 
The item writer is also frequently faced with the problem of determining the exact number ol 
items. As a matter of fact, there is no hard and fast rule regarding this. Previous studies i 
shown that the number of items is usually linked with the desired level of reliability coefficient of 
the test. Studies have revealed that usually 25 to 30 dichotomous items are needed othe 
reliability coefficient as high as 0.80 whereas 15 to 20 items are needed to reach the same al 
reliability when multipoint items are used. These are the minimum numbers of items Ww a 
should be retained after item analysis. An item writer should always write almost ite. 
number of items to be retained finally. Thus, if he wants 30 items in the final test, he shou , : b, 
60 items. In the speed test, the number of items to be written is entirely dependent upon [te 


| 
36 | 


a 


' ) , fi ; 
intuitive judgement of the test Constructor. ¢ Jn the basis of h iii 
. a5 is 


* i number of item ie ; ; j : 
thara centain'ne * can be answered within, the given ous &*Periences, he decides 
An item writer must also haye Sufficient knowled bIve imit. 
4 ike u oT es 
jems as well as the merits and demerits of each t BE repar 
tert! 


|, Essay item or free-answer jter, 
2. Objective item or ‘new type’ of item 


ding the different § 


¥PE. In a broad SENSE, items OFS OF types of 


are of two types: 


Essay Item or Free-answer Item 


An essay item is one in which the examinee relies Upon his Memory and 
1estic ine MOP an 

answer the questions ¥} : pie words only. Since Such items can he answered in wha 

one likes, they are also known as free-answer items. Essay items are m - se eed lgd 

measuring higher mental processes which involve the process of ~ ™MOSt appropriate for 


ser ; 5 of synthes: alysi aa 
organization and criticism of the events of the past, Essay tests ne . 515, analysis, evaluation, 


traits like critical thinking, originality and the ability to inte US suitable for measuring 


integrate or synthesize or analyra 1: 

— ate . al 

events, Essay items are of two types: short-ans wer type and long-answer type or eae 
essay type. A short-answer Essay item is one where the examinee supplies the answer in one of 


two lines and is usually concerned with one central concept (Marshall 2 Hales, 1972). A 
long-answer essay item is one where the examinee’s answer comprises several senten , 
an item is usually concerned with more than one central concept. 
illustrate the two varieties of essay items: 
1. Explain the meaning of reliability of an educational test. (Short-answer essay item) 
2. Describe the methods of estimating reliability and validity of an educational test, 
{(Long-answer or extended-answer essay item) 

Nowadays essay items are frequently used by teachers to measure the achievement of pupils 
in a classroom. The primary merit of such items is that they encourage the examinee to give a 
coherent and organized picture of his memory and past associations. In other words, they require 
the examinees to organize and produce the answer rather than recognize the answer. Hence, a 
better inference can be drawn regarding the examinee on the basis of these answers. The most 
serious drawback of such an item, however, is that marking is highly unreliable. It varies from 
scorer to scorer and sometimes within the same scorer, when he is asked to evaluate the answer at 
different time intervals. Not only this, scoring of the essay items takes a longer time because of the 
length of the answer and also because the scorer is required to read each line very carefully. Thus 
essay items lack scoring economy, too. Despite these weaknesses, essay items are popular 
measures of educational achievement in the classroom, 

Essay test items present two important difficulties. First, the. factual knowledge may 
sometimes be confounded with the ability to organize and synthesize an answer. The examinee 
who knows more fact correctly may write an answer that looks better and superior when the 
originality may be missing from the answer. Second, in comparison to objective items where the 
examinee is required only to select from alternatives, here in essay items, he is required to think 
out an answer and compose a written response, Naturally, this takes more time. im a 
consequence, differences in performance can depend on ability to write quickly in correct le 
Not only this, due to the preater time required to compose an answer, very few questions ca 
asked and this may lower the content validity of the essay test. | - 

To combat these problems, several variations in the testing SLOT nin have been proposed 
by psychologists and educators. Following are the important ores -_ en 

bate His al, the examinees are permitted to 

(i) Open-book examination: \n this format, This technique reduces the impact of 
textbooks or classnotes for answering the resicns: Ine lows examinees may devote 
differences in knowledge. The demerit of this method is that the less a 


Past associations ta 


ces. Such 
The following examples 





CEES OC OEOn a Aciencer 


much time in searching material 


time in composing the answers. Thus, the examinees who 
materials in books or notes may feel that they have less time 
been on the pattern of closed-book. 


(ii) Take-home examination: As ite name implies, in this ¢ 
examinees don't have time pressure during the test and it allows th 
completely unsupervised situation as long as he or she wishes (or has 
drawback of this format is that it sacrifices test Security. In fact, there 
examinee has worked independent! 


during the test whereas the more able e 


devote 


More time 
than if the examina 


ay *€archin eh 
lion would; th, 

VW 
© examinees to 


the time to do) ste | 


. iS NO Way to know aio, 

y. This apprehension becomes still MOFe stronpe, tthe 

internet information and use of cell phones, Ber dy 
(iii) Cheat sheets: Cheat sheet is one variation of open-book examination 
examinee is allowed to bring to examination one piece of pa 


points or other information as desired. This technique has 
where formulas or several smal] points or facts constitute a 
technique reduces dependence on the textbooks. 

(iv) Study questions: This technique combines 
take-home examination. Here the examinee is provided with a list of study questions from “ any 
examination questions will be set or drawn. For example, the examinee may be provide a4 ise | 
10 to 15 questions and told that the examination will contain 4 or 5 of them, This technique ll 
the advantage that the examinees can take as much time as they wish for Preparing ‘te i 

examination and can also use whatever resources are available to them intensively. ) : 

In order to write good essay items, the following suggestions must be kept in mind: 

1. An essay item must contain explicitly defined problems. Usually, essay items are 
intended to measure the higher mental processes, As such, it is essential that they conta, 
problems in clear-cut and explicit terms so that every examinee interprets them in more or | 
the same way. An essay item is not said to be valid if its Interpretation varies 


2. It must contain such problems whose answers are not v 


ery wide. In case a Student j 
asked to answer a problem with a larger content area, he may start writing whatever 


} he knows. 
without making any discrimination. In such a situation he may not write about the facts o; 
information needed by the item, thus lowering the validity of the essay item. 


3, Essay items must have clear-cut directions or instructions for the examinees, 
instruction should indicate the total time to be spent on any particular ous what typ 
information is required, and the likely weightage to be given to each item so tha 


may pick up the relative importance of the essay questions and according! 
the answers. 


per with as many facts, | fees the 
been found to work : 
large part of the subject Matter ™ 

- This 


features of both the Open. 








The 
e of 
t the examines 
y, adjust the length oj 


| 
4. Sufficient time should be allowed in the construction of the essay items. Such items 


measure the higher mental processes and in order that they actually measure what they intend to 


measure, it is essential that they are carefully worded and ordered so that they are interpreted in 
the same way by all. 


f f i H 
5. One should start essay item/question with phrases such as eee ai ek. | 
criticize’, ‘Explain how’, ‘differentiate’, etc. The use of these words or phrases - he on 
the examinees with task requiring them to select, organize and apply information. Essay item 


. ourage | 
should not be started with ‘what’, ‘when’, ‘who’ or ‘list’ because such words enco ge 
examinees only to the production of facts. 


6. The item writer must clearly know what mental processes he wants to assess a 
Starting to write the item/questions. He must fully understand the kinds of response beat 
represent the abilities that he wishes to assess. Once the abilities or competencies be ‘ 
decided, he can select, adopt or create items/questions that will require the examinees to disp | 
those abilities, 

















SS! ns 


7, The item writer should use novel mate 


istions. Essay items always INFENC to test 
» que! essential that the item writer mu 
it is © 


han merely reproduce the materi 
‘Ve 2 , 


Hem Writing 3g 
te OF Novel OfPanisation ‘ 
16 Oxaminepe’ bitin 
; r Y¥ 10 us 
e St put therm jn ‘Muations where 
sum als from the lextbook or 
mo he item writer should write escay a. ; 
3, The ilem write dosha ic Ff vay Hem in such a Way that the} 
.examinees. Here, the aim should be that each Xam rege’ 
for a “ the given task and not that how well he can figure oy 
ant ' 
ca 9. When the essay item relates to a co rsial issue. it 
.<entation of evidence for a position rather than only fort 
the | ay item such as explain how the idea Of live-in relationsh 
the © rated in terms of presentation of evidences fo 
4 (cd eva 
the position taken only, 
i08 the item writer has written the good essa 
he test will function in the way he wants to 
that Ss item writer prepare the essay test in such 
help jwanted effect on test scores. 
ut U ; | r | 
bu {. In preparing an essay test, the test constructor or the item writer must be sure that there 
nine many or too lengthy questions for the examinees to answer in the given time. An essay 
are ful not act as an exercise in writing speed. The 
tests 


examinees must get adequate time to think 

d compose the answers. The more complex questions dernand longer time to answer the essay 
andc 
items. 


2. In most situations, all examinees should be re 
: ” of questions. Allowing the examinees a choi 3 quest : 
raat basis for differentiating among them. In fact, Comparing the quality of one examinee’s 
ea questons/itern 1 with other examinees answer to question/item 1 introduces an 
answ 


dditional source of subjectivity and measurement error into the process of testing and 
acd 


W material 
tli 


their infor 


hey are 
elsewhere, 


in phrasing 
mation, For 
required to de 


ask 15 ye 


: ry Clearly defined 
5 SCOTe mist 


reflect how well he 
t what the task is, 

should be ®Valuated 
aking that Position, F 
Ip fits with Indian cy 
Fd position taken by the exa 


ntrove 


in terms of 
orexample, 
lture, should 


minee and not 
for 


y items, he has to do several 
do. Following are some of t 
a Way that will help elimin 


things that assure 
he guidelines that 
ale some common 


quired to answer the same questions or the 
ce for answering the questions reduces the 


3. As far as possible, the test constructor should have a esnbehdcdad 
‘several essay questions or items. Essay tests are designed to make differentiation among 
sah in the degree of mastery over the content area. If all questions/items are very difficult, 
esti, | less proficient examinees may remain unable to produce satistacory answers to the 
some ot On the other hand, if most questions/items are easy, the test taker will not be able tc 
iil full capacity of the more proficient examinees. Therefore, a range of difficulty shoulc 
veiled 66 thatthe (oe may have questions appropriate for all examinees. 


range of complexity and difficulty 


4, Test constructor should provide a set of general directions for the entire test wie 
providing simple and straightforward directions like, “Answer the flowing questions’. Ag 
set of direction should emphasize upon the following major principles: _ 

(i) The form in which answers are to be provided must be clearly ne 

(ii) A general plan the examinee should take in the class must be ne 
(iii) The general criteria for making evaluation should also be ciganty stated, 
(iv) The total time for answering all questions should be clearly written. 


. tion should clearly be 
\v) The score point to be given for each question or portion of a question 
spelled out. t iog Aifficull’tasle: In:seontne th 
0 far as the scoring of the essay items is concerned, it ‘hee ~ information becaus 
responses of the essay items, the emphasis should not be place i a Here greater emphas 
for such information objective items are more suitable than sia ae dues ealiien 
should be given to presentation of thoughts and ideas in a coher 


oe ee ee ee 


7 JOO, JASONS fle esa cat bie bet Ree eee 


and the final judgements of the examinees. There are two general methods 
responses of essay items: the sorting method and the point-score method, 

In the sorting method. the answer sheets of the examinees are first sorted into ‘ 
groups according to the fairness of the answers. For example, one group may be designar. fete, 
best- answer group, another the better-answer group, and still another as the POOr-ansive q asi), 
There is no specific rule which limits the number of sorting groups. It is dependen Broy, 
number of examinees and the number of discriminations that can legitimately be maa thy 

the different answers of the essay items. After this, the scorer assigns the weightage to be, = Opp 
each group and accordingly, he gives weightage to each answer and finally adds these | ven , 
together to constitute a total score for each paper. The advantage of the sorting metho ahs 
scoring is done quickly and chances tor erratic marking are reduced considerably, 'S thay 


The point-score method is well suited to score answers of short-answer es 


of SCorip 





say items. In . 


method, a grading key consisting of the correct answers and the points to be assigned js Dre ig | 
afeq | 


beforehand by the scorer. The scorer has only to compare the answers of the examinees 

answers or responses of the grading-key and assign points. Subsequently, all these 
added together to constitute a total score for all the answers in a given paper. The de 
point-score method is that it cannot be applied to the extended or long-answer e¢ 
ditficult to prepare such a grading-key. 


With t 
METit of th 
“SAY as it i 


In addition to these two methods, some guidelines must be kept in view becaus 

ease the scoring of essay items. Reali ty is that a good evaluation of an essay test requires no} 
a good set of items but also sound and consistent judgement about the quality of ancl 
provided by the examinees. As we know, variations in scoring from one question/item to ‘neat 
question/item and from examinee to examinee contributes to the error in the sce 


& these will 


Measurement 


Therefore, there are some general guidelines, which if followed, will reduce the Probability of 


committing such errors. These guidelines are as under: 


I. The examiner should decide what qualities should be kept in view in judging the 
adequacy of an answer to each question. If the examiner wants that more than one 
quality is to be judged, each one should be evaluated separately. For example, if the 
examinees are to be assessed for grammar and spelling, they should be evaluated 
separately from when they are to be evaluated for the organisation of facts as well as their 
knowledge. Of course, the examinees should get information about this in the direction 
of the test. 

2. The examiner or the evaluator should read all answers to one question before going on 
to the next one. This will serve two purposes. First, this will help the evaluator in 
maintaining a more uniform set of standards across all papers. Second, the evaluator is 
less likely to be influenced in judging the quality of the answer to one item/question by 
making a comparison to how well the same examinee has answered the other questions. 

3. A model answer or Answer guide should be prepared. Such model answer must contain 
what points should be covered to receive full marks or credit. 

4. The paper should be graded as nearly anonymously as possible. If the evaluator does not 


know or knows less about who wrote an answer, he can more objectively score the 
answer. 


5. When one question for all examinees have been scored, the papers should be shuffled 
before starting the process of scoring of the next question. An average examinee who has 
the misfortune of having his paper scored right after a brilliant examinee, is likely t0 
Suffer by the comparison. When the papers are shuffled, the order in which pape 
appear changes from question to question. 





: ftem Writing 41 
_ As far as possible, the evaluator sho ° 


- ‘ y or! Mm Bes. | rhe rationale behind iS , | 2 " ? MN 


inees get prompt specific ; 7 
examinees fet prompt Specific information about their stre 
serves as a Motivational device for ther. osI" strength and weekness, this 


Objective Item or ‘New-type’ Item 
A objective or new-type item is one wherein there js only 
n 


| one fixed correct answer, which either 
the exam 


+ eryve his own or he is requ | 
nee gives on his 5 required to select from ama 
at : | : UNE a given few, All obiectiy 
ted into two broad * - AM objective 
tems can be divided erat Categories: the supply type and the selection type. Kopel 
e of item 15 one w' : " aminee 1S lo write down the correct answer on his own. A 
action type of item Is one wherein the examinee is to select or identify the correct answer from 


sel n. Nunnally (1970, 119) refers to such items as identification items, 


a few give 

The supply type of item is divided into two main categories: the Unstructured short-answer 
ian d the completion item or fill-in item, An unstructured short-answer item is one which is 
“ven ina question form and where the examinee supplies the correct answer in terms of a word, 
“umber OF phrase. A completion item or fill-in item Is One which is presented in the form of an 
incomplete statement. The earns is Supposed to complete the statement by supplying the 
missing word, number of phrase, etc. Unstructured short-answer tems and completion items 
differ only in form because when an item is written in the question form, it constitutes an 
unstructured short-answer item but when the same statement is presented in the form of an 
incomplete statement, it constitutes a completion form of item. The following examples illustrate 
the two types of items: 

1. Who was the first president of the Indian Republic? (Dr Rajendra Prasad) (An 

unstructured short-answer item) 

2. The first president of the Indian Republic was Dr Rajendra Prasad. (A completion or 

fill-in item) 

Of these two types of items, the completion item is frequently used in the field of 
mathematics, chemistry, physics, and biology where isolated bits of information are to be 
recalled. Completion items are also used in vocabulary tests and where the association between 
dates and events is wanted. Completion items have some advantages. They require the 
examinees to recall their past associations rather than to merely recognize them. It is, therefore, 
essential that the examinee must have a thorough knowledge of the subject matter. In completion 
items, the guessing of the correct answer is reduced to a minimum as there are no options or 
answers given. Completion items have also some disadvantages. Organizational ability requiring 
complex understanding and applications of scientific and theoretical principles in which 
synthesis, analysis and evaluation skills are required, is not ordinarily measured by these items. If 
adequate care has not been taken in writing the completion item, its scoring may not be 
objective. Sometimes, it happens that the correct answer is not acceptable to many experts 
because of the inherent defects in the structure of such items. In such a situation, the scoring 
tends to be subjective. Besides, scoring may get delayed for considéraule time. iv wigs 
completion items, a scoring key is prepared in which aH the correct answers are liste haa 
completion items are scored by matching the examinee’s responses with the as ce 
key. Usually, for each correct answer a score of +1 and for each sea miner ire 
given. The total score is obtained by adding the individual scores. Gene yr oe. cs 
NO penalty is given for an incorrect answer as chance plays an insignilican | 
such items. 


42 Tests M, 
athe, Cnty i | = F 
urements and Research Methods in Behavioural Sciences 


In writi =. 4 
writing the completion items, the item writer must keep in mind the ; 
‘lo 


Ti i : | fe 
The item must not be ambiguous. Given here are two exam 
can be worded in an ambiguous form and a clear-cut form, 


lowing | 
Pes of how hee | 
™ Sa | 


Me I 


Ambiguous form: Behaviourism was founded in _ hy 


The above item has two possible answers: ‘1913" and ‘Armee 
are correct. "Ca, both . 





Clear-cut form: Behaviourism was founded in the year _ 
or | 
Behaviourism was founded in the country known as 


2. Clues which give a direct hint to the correct answer must be avoided. For Sie 
““aMDle. 


A person who believes in God is known as a and a person wh 

believe in God is known as an a : 0G, 
Obviously, in the first blank “theist” and in the second blank its opposite “athe: | 
be filled in. In such items when the examinee knows the answer of the first blank | “i 
without much reasoning, answer the second blank because it is the Opposit ca 
answer of the first blank. = , 











3. A large number of blanks should not be used in completion items because jin that 
originality (and not the real omissions) wou ld be the only source of response, al 
For example: 


Personality is the dynamic organization _ that his 


| 


In the above completion item the examinee is required to give the definition of personaly 
which in the words of Allport is “the dynamic organization within the individual of tho 
psychophysical systems that determine his characteristic behaviour and thought”. Therefore, th 
above item may be completed as “Personality is the dynamic organization within the individyy 
of those psychophysical systems that determine his characteristic behaviour and thought’. Inti 
way, the original idea may not be conveyed if there are too many blanks. 

As we have discussed earlier, another important category of objective items is the select 

type of item where the examinee Is required to select the correct answer from among a few give 
answers. Selection-type items, as used in psychological, sociological and educational ry 
are of many varieties. Some of the important varieties are two-alternative items, multipievern 


items and matching items. 

1. Two-alternative Item 
in this type of item, only two answers are provided from which the examinee is required to selec 
he one which he thinks to be correct. The two-alternative item, also called dichotomous formal 

em, is further divided into four subcategories. 
(a) Yes-No: This is very common. Here items are expressed in the form of a question. Fo 
example: | 

1. Do you suffer from insomnia? Yes/No 


2. Do you feel shy in talking with the members of the opposite sex? Yes/No | 
aralive 


| 


(b) True-False: True-false form of objective item Is expressed in the form of a decl 
statement, which is either entirely true or entirely false. For example: 


1. Wilhelm Wundt is regarded as the father of experimental psychology. 
2. Mahatma Gandhi is popularly known as ‘Rashtrapita’. True/False 
3. Adolf Adler was the father of psychoanalysis. True/False 


True/False 


a“ 


lem Writing 43 
Wrong: Right-wrong form of objec live item is e 





jaht- a 
c) Rib sto bem: apressed in terms of si 
sentence. which are to be marked as right or wrong, depending upon the ea one 
incorrect ness of the language of the sentence, For example: EC UIESS OF 
1. He needs not fo to cinema. Right/Wrong 
(The sentence is grammatically wrong.) | 
Mohan said that the sun rose in the east. Right/Wrong 
(The sentence is grammatically wrong.) 
3. He said that honesty is the best policy, Right/Wrong 


(d) The Correction Form: In this form, the examinee is required to correct the item by 
| roviding the right aver: Usually, the wrong part of the item is underlined or written in 
colour OF in italics, and the examinee is required to substitute it by a correct answer. For 
1. The first psychological laboratory was established by Wilhelm Wundt in 
1869. (1879) 
2, January 26 is celebrated as Independence Day in India. (August 15) 
ictly speaking, the correction form of the two-alternative item is neither the selection type 
co supply type- Rather, it is a combination of the characteristics of both because the 
nor ae supplies the answer on the basis of selection of responses evoked by the wrong answer 
examin 
i serted in the Item itself. ; : : : 
i The two-alternative form of the objective item is mostly suited to personality tests and tests 
oninterpretative information like dates, technical terms, and vocabulary is required. The 
whe citation of the two-alternative form is that the probability of guessing the Correct answer 
gow Thus in this form of item, the examinee can select a correct answer on the basis of 
nae for which 50% probability exists. That is the reason why such items are not preferred 
; oe case of intelligence tests and aptitude tests. However, the advantage of the two-alternative 
3 . is that it is easier to construct and takes less time in scoring and administration. 
| | ® i: ~ i = = = 
There are some general suggestions and guidelines that must be kept in mind at the time of 
writing two-alternative items. 
(a) The statement must be simple and straightforward and not complex and indirect. For 
example: | | 
Poor form: The experimental study of human behaviour 
hundred years ago. 
Improved form: The first experimental study of hu 
in 1879. 


‘nthe laboratory started about 


man behaviour in the laboratory started 


(b) The staternent must be either entirely true or entirely false so that only one interpretation 


is possible. For example: 

Poor form: The capital of India ts Delhi. True/False 
Improved from: The capital of India is New Delhi. 
cause they tend to confuse the 


True/False 
idec examinees. For 
(c) Double negatives must be avoided be 
example: | a 
Poor form: A driver can have no accu rate idea of the spe 
speedometer in it. True/Flase 


Sg ne 
Improved form: |n the absence of a speedometer, the driver is unable to 
‘dea of the speed of the vehicle. True/False 


ed of the vehicle if there is no 


get an accurate 


isis, Medadsurements and Research Methods ii feniacimirres snes 


(d) The exact authority should be mentioned if the statement of the item js CONtrOVErs, 
example: | —— wail 
Poor form: Intelligence reters to the extent to which a person is 
past experience. True/False | | - 7 
Improved form: According to Ebbinghaus, intelligence is the ability to le 
by past experience. True/False _ | 
The first statement, which reflects the viewpoint of nee welthan referring 2. 
name, is an example of the poor form of the true-false pate ae es may have 
justify his statement. But when the name of the auinority Is oe oma i IN the . A, 
statement), the statement is improved because now there remains NINE COWDE about the validity. 
the statement. . a _ sete 
(e) Specific determiners which are likely to help the exam oe ING Out the co, 
answer must be avoided, Statements which incorporate such words as never, Nothin 
always, alland none are likely to be more false than true. When such a word is used < 
false statement, it provides additional clues to the correct answer and thus ac 
specific determiner. 


able to learn OF prog: 
iI h 
y 


arn ) 
Flo Prof 


Likewise, statements containing often and sometimes are likely to be more true 
false. If these are used in true statements only, they provide additional clues to the co 
answer. Hence, such specific determiners must be avoided or a proper balance of each 
of these two types of specific determiners in true and false statements must be made. 

() There must be a fixed sequence of the true statement or the false statement when suck 
items are finally arranged in the test. For example, True, True, True, False, True, True 
True, False and so on will be an unwise sequence of items but True, False, False, True 
True, False, True, False, True, True is an example of a wise sequence of items. 

(g) If a statement has been taken from some book or some other source, it should be 
paraphrased and not reproduced verbatim. The advantage of this is that it will 
discourage any profit by rote memory. If the subject has read the book from which the 
statement has been taken verbatim, he will naturally select the correct answer. But when 
the language has been changed, the examinee will not be able to profit from his rote 


memory. He can select the correct answer only when he has understood the 
fundamental principles involved therein. 


than 
Mery 


Multiple-choice Item 


is the most popular, common, flexible and effective of all objective items. In each item of the 
iltiple-choice, there is one stem in the form of a question or incomplete statement, which is 
erally followed by a set of three to five answers or options. This is also known as polytomous 
polychotomous format item. The following examples illustrate a multiple-choice item: 
1. The theory of psychoanalysis was advanced by 
(a) E L Thorndike (b) | P Pavlov (c) J B Watson (d) S Freud 
2. Which of the following is a projective measure of personality? 


(a) Guilford—Zimmerman Temperament Survey (b) Cattell’s 16 PF Test (c) Rorschach Test 
(d) Bernreuter Personal ity Test 


The stem, also known as the premise, has two primary functions: it presents the problem and 
s the stage for giving answers by providing an appropriate frame of reference. The set of 
ers or alternatives or options provides the field for operation of that frame of reference which — 
nated by the stem. Although there is no rule for the exact number of alternatives to be 
ded in the set of options, usually four to five alternatives are preferred. In order that all 
atives appear equally attractive, they must be of about the same length and complexity: | 





, the piven alternatives, either one IS Correet ¢ : 
yh The examinee ts required to indic ce eae 


ale that ¢ Orrrese the best amt 
. Ie re a 3 re ; ; le i , 

st answer = calles ine ke yed answer , x best eee, The correct res ( 
tors OF foils. The aim of foils or distractors js a a . ne alternatives are aie or 

a er an L : ep . . Ve ; 
en tausible alternative, which he may Choose in prefare 5 Use the examines but 
uff - ~ : Te , a | 
ven Know the correct answer. It IS, therefore, essential theateae the keyer| alternati 
ne iguity: Asamatter of fact, multiple-choice questions are iit 
am Loved responses are ee Mica! The Breater the Similarity, the mo 
itiple-choice Sa agape daa Measuring instructional objectives, ‘oe e 
Mu analysis, synthesis, application of paca Which include knowledge 
ocabulary, identi _. Penciples, evaluatior lb 
ve ociation, ability to iden ify sie difference oy ability to Interpret dat 5 an = Saale 
assO™ is: they can be used a. Such items h:; 
marked a | nee the m = . oe any educational and instructi ee Lite 
cept the ability to orga : e eee) of the examinees of Srarchwecnese me aon ability 

= sta i = | ra. s ‘ : r e. Cc +r 

bjective items, wey ae © scored rapidly, accu rately, and objectively b palin 
little training in the testing field. : ‘Uvely by even those 


I tasty Writing 45 
Am 


M2 all those BIVEN 5 


istrac : 
lO provide a 
ve it he does 
produce any 
istractors and 
ficult is the item, 


; MUSE mest 
Teult because the d 


ve 
who ha : a ee ae 
Another advantage of multiple-choice Item over two- alternative item is that the respondents 
(or te stees) cannot receive credits for simply knowing tha | eee 
OF ees, 


t the statement is incorre 
eee | | ct, rath 
must also know what is correct. Look at the difference in the following two items: — 
1. Mrs Indira Gandhi was assassinated in 1986. True/False 


9, Mrs Indira Gandhi was assassinated in 
(a) 1984 (b) 1986 eer — 


in true-false version, the respondents will receive credit if they mark it false e 
that Mrs Indira Gandhi was assassinated in some year other than 1984. In fact, if a respondent 
thought that she was assassinated in 1986, it would be marked false and he will receive a credit 
for that response. Since ma rking the statement lalse does not show that the respondent knows 
what is correct, the resulting score tends to be an inadequate measure of what the respondent 


knows. But this problem is solved with multiple-choice items because the respondent must select 
the correct answer to receive the credit. 


ven if they think 


Another advantage of multiple-choice item is that it is relatively free from response sets. In 
other words, respondents don’t favour a particular alternative when they don’t know the correct 
answer. Still another related advantage is that using a number of plausible alternatives makes the 
results amenable to some kind of diagnosis. The kind of incorrect alternatives selected by the 
respondents provides clues regarding the factual terms and misunderstandings that need to be 
corrected. 

Another most commonly cited advantage of multiple-choice item over true-false item is the 
higher degree of reliability per item. Since the number of alternative is increased trom two to 
three, four or five, the opportunity to guess the correct answer is reduced and therefore, reliability 
is correspondingly enhanced. In fact, the effect of increasing the number of alternative for each 
item is similar to that of increasing the length of the test for enhancing the reliability. 

Multiple-choice items have a few inherent limitations, too. Such items should not be ‘i ” 
measure the examinee’s ability to organize materials or to examine his ability to ta ai 
or for measuring problem-solving skills in mathematics and science. Multiple-c chee 
difficult to write, and demand a thorough knowledge of the subject eae ‘ azicenorl 
knowledge of the art of item writing. This means that multiple-choice sea longer Brae tate 
by trained persons. Such items, as compared to the i seb Lay on aed ee 
answered by the examinee. Another disadvantage 0! multip e-C - Hhlex evereaacis 
finding of a sufficient number of incorrect but plausible = die wlbde in the area are 
ia the respondents are at primary level, where their vocabulary 
Imited, 


rocoanch Methods in Rehbariounal Sciences 
46 Tests, Measurements and Rest f 


An item writer must keep in view the: following suggestions wie ny Multiny, 

n 

choice items: sither shorter nor longer than the other alternatives 

(a) The keyed pos ot al re ome i additional clues to the keyed res 

in either case, the i said be randomly varied in. its position. Often, jt has bee 

ib) The keyed response $ ssket unconsciously or habitually, places the keyed respo Se of 
reported that the ade ‘ 5 or at the bottom of the set of nem wey this is a re , 

Se ti ecuien ig to pick up the keyed answer and will respo 

pattern, the ex | 


Idb dom| i webs 
aT rando 
nuch reasoning. Hence, the order of the keyed response should be i 
n NINE. ' 


each item, 


beca 


ar 
| Ou 
hanged i 
Se ritten in a clear language and it must present a Problem 
stem of the item must be written in ac Bua on | : 
“ agai to which is to be found from among the given alternatives. For example: 
Poor form: Learning 
(a) is change in attitude 
(b) is change in retention 
(c) is change in behaviour 
id) is change in belief 
Improved form: Learning ordinarily refers to 
(a) change in attitude 
(b) change in retention 
(c) change in behaviour 
(d) change in belief 


In the above two examples, the second one is definitely superior to the first one. In the first 
example, the stem does not set any problem but gives only a general heading or topic, which jg 


lollowed by one keyed answer and three distractors, But in the second example, the stem Clearly 
states a problem the solution to which is answer ‘ 


(d) Phonological similarity or clang associations between the stem and the keyed answer 
must be avoided. Whenever this occurs, the examinee i 


s likely to pick up this association 
as a possible clue to the keyed answer. For example: 


mT 
. 


C 


(c) and | 
ponse. 


the above examples, the first example 15 NGF ie yp lr 

In i) are the correct answers hut 1 the “Ee Corie ey; : 
at an improved form of the Multiple-choicg “see 

pence! Negative should be used sparingly in a 
(H a Faas in the stem, Benerally incre 
examinees to perform more difficuly 
ctem, it shou Id be underlined or cap | i 
Negalive items provide little information about exactly iy tention of the ¢ 
use of negative with words like Not or Except at times is justif 
to test whether the examinees correctly know the @ 
detect errors. 


ig) The item writer must be sure that w 
5 deceptive. As we know, one advantage of multiple-choice iterns is 


select the correct answer from among three, four or five alternatives and thus, th 
automatically reduce the chance for BUeSSINg the correct answer This purpc ma a 
served in a better way only when the Wrong alternatives of multiple-choice ; wn be 
attractive and plausible to the examinees who lack knowledge that the item is tence 
to assess. Therefore, incorrect answers must be logically Consistent with the stem and 
should represent common errors done by the examinees. 
(h) The item writer should use the option ‘none 
the key answer can be taken un 
tests, mathematics test and stud 


faery 


Writing 47 
examnp| 


ff only rf) is th e because both 


© Coffey aniwer and 


N ther, Multiple. hoire ite 
ase the readin 


é 


“THs with 4 tye 
8 diffic ulty 2 words 


’ i the iter = 
Feasoning, Whe Mand require the 
italig "8: When a Negative word IS included j tt 
Nani Zed for attracting the ded in the 


lke Nog 


Aarne, 
Nows. The 


| 5 IMportant 
ACEDLION to a zeneral rule or can 


the CXaMinee k 
ler] because it | 


re lausi 
NE answers are plausible, that IS persuasive but 


their Capacity to 


) of these’ or ‘none of the above’ only when 
€quivocally as correct or incorrect. In tests like spelling 


y skill test an absolutely correct answer can be provided, 
But in other tests, where the examinee is expected to choose the best answer and where 
the keyed answer is the best answer. but not necessarily the absol 


the use of these two options, that is, NONe of these or none of the a 


€ option ‘none of the above’ sh 
er than a sentence to be complet 


utely correct answer, 
bove is considered not 
ould be used when the 
ed. For Example: 


stem is stated as a question rath 
Poor form: Of the following, 


the one that is not included in the parts of human brain is: 
(a) Chromosomes (b) Plasma 
Poor form: Verbal intelligence test measures intelligence on the basis of (c) Platelet (d) None of the above 
(a) Manipulative responses (b) Verbal responses Here the use of the option ‘none 
(c) Oral responses {d) Pictorial responses 


Improved form: Verbal intelligence test measures 


of the above’ is not onl 





the stem but it is logically inconsistent 
the option and the ‘not’ in the 
option here would be 


intelligence on the basis of 


¥ grammatically inconsistent with 
with the stem too. Mor 
(b) Written responses 


eover, the double negative effect of 
stem, makes comprehension of 


the meaning difficult, A better 
the use of ‘none of the above’. For Example: 
Better form: Which is not the part of human brain? 


(a) Chromosomes 


(a) Manipulative responses 


(c) Oral responses (d) Pictorial responses 
The first example is inferior to the second example because there is a clang association 
between the stem (verbal intelligence test) and the keyed response (verbal responses). When 
written responses’ is substituted for ‘verbal responses’, the clang ass 
Is improved, 


(b) Plasma 
ociation is over and the item 
le) 


(c) Platelet 


(d) None of the above 
The item writer should avoid the use of option ‘all of the above’ in the multiple-choice 
Alternatives must ensure that there is only one correct 


or best answer. If there is igre lest, This type of response option Is often used either when on ly three sew ed = 
than one correct or best answer, there will be contusion. For example: De generated and a fourth option Is needed, or when et ‘i ie pil ae i: ea 
Poor form: In two sets of ranked at thes sipsasii cs traits or characteristics of something and it is very difficult to come up 
m : i sin analilli Neorrect distractors. In the latter situation, it becomes easy to list three of the 
(a) Pearson + characteristics along with the response option of ‘all of the above’ statement. | 
(c) Kendall tau 50 far as the scoring of multiple-choice items is concerned, it is very easy. A separate SCOrIng 
Improved form: \n two sets of ranked data, th 


> Prepared and after matching each response of the scoring key, points are awarded and 
(a) Pearson r 


“quently these are added to give the total score. Scoring of multiple-choice items can also 
(c) Point-biseria| r e[ rough 4 punched-out key, 


asure Of correlation js 
(b) Biserial r 


(d) Spearman rho 


€ appropriate measure of correlation is ey | 
(b) Biserial r Subs 


d 
(d) Spearman rho 


Smee ee BR he i | 


3. Matching Item 


In a matching item, there are two columns—right and left. The items on the left cg 
Paired with the items on the right column. Items on the left constitute a set of 


u 
premises and items on the right constitute options or responses of the iter, F ' 


related em ' 
or “Xample. eal 

the names of some statistical techniques have . 
the names of the persons who are closely ssociatey <" 
v& Names with the statistical] techniques: Ww 


In the left-hand column 
the right-hand column 


di, 
Ith the, 


B I, Product-moment method A Fisher 

c 2. Rank difference method B Pearson 
D 3. Coefficient of concordance < Spearman 
A 4. F ratio D Kendal 


It is obvious from the above example that the above type of items IS quite <i. | 
multiple-choice items. Both the simi 


types of items have a list of responses or options Ord; ig! 
dates, events, names, technical terms, diagrams, chemical names, etc., are used 4g Hla 
matching items and more or less 


a similar variety is used as the r 
events with dates, names with events, books With auth. 


50 0n. Thus Matching items measure dn e€xaminee’s abit | 
and hence are effective in mea 


examinee may be asked to match 
names, terms with definitions, and 
associate one thing with another, 
associative learning is involved. 


ESPONseS of the item «" 


suring achievement Wher 


Matching items have, however, some inherent weaknesses. First 


, it is not Possible ty 
measure real understanding of concepts 


through matching items, partly because they sim 
measure the factual association and partly because they do not measure the Organizational abil 


of the examinee. Second, if the examinee knows the correct answers of al] Pairs except one, he 
may, through the process of elimination, correct the last pair. Third, matching items are NOt Suited 
when the Purpose is to measure different types of information because the CONSEFUCTION Of such 
items assumes that the set of stems as well as the set of alternatives would be homogeneous, One 
different pair among the homogeneous pairs is likely to provide an extra clue to the corres 
answer and, hence, it is not recommended, Lastly, the examinee takes longer to answer matching 
items as compared to two-alternative items and multiple-choice items. | 


wl 
An item writer must keep in view the following suggestions and guidelines while writing 
matching items: 


(a) The number of alternative answers (or response options) must be greater than the 
number of stems. This is done in order to reduce the probability of correcting . 
(usually the last pair) through the process of elimination. For instance, when the numbe 


of stems is four and the number of alternative answers js also four, and if the examinee 


knows the correct answers to any three then he will automatically, through the prone 
elimination, correct the last pair. But when the number of alternative answi 
increased from four to six, this probability is automatically reduced. 


: rections regardi asis of matchin 
(b) The item must contain clear instructions or directions regarding i ng heliin 
"that is, the instructions must state what should be the basis of mat ing. 


heir aut the names oi 
Given overleaf are the names of some books and their op muse 
the books with the names of their authors. Matching ng ed ) 
appropriate letter in the space provided before each number. 





Matching Authors Her Writing 49 
B 1. Freud B 
3 AM ?00ks 
* Mag 'Se-Tung : Experiments With Truth 
L — Najendra Pras, 8 Divided 
D 5. Shakespe, . ic - The Merchant of Venice 
E. Friends, Not Masters 
ON Contradiction 
; G. The Ra 
In the above eh a : . cee NsttUctio TOF matching the nam oe 
of the authors an | ence, an examinee Will not falter in the process of es of the books With those 
(c) The set of stems and the set of response opt Matching. 
Placing a different stem a | 


ions Must be Worded j 
OUS stems Or a dj 
uld Provide a clye 
UMN II With those ; 


EIN Column l. Indica 
Ect answer in th ree bes 


© space provided before each 


Mong homogene 
among homogeneous FESponse oot 


Poor form: Match the items given in col 


answer by putting the letter of the corr. 
number of column I: 


Matching Column | Column |] 
C 1. 1821 A. Liberation of Goa 
D 2. 157 B. 1526 
A 3. 1961 C. Death of Napoleon 
B 4. First battle of Panipat p. First War of Indian Independence 
F 5. 1688 


E. Birth of Swami Vivekananda 


F. Bloodless Revolution in England 
Improved form: Match the items given in column Il with column |, Indicate your answer 


before each number of column |: 


Matching Column | Column ji 
$ 1. 1821 A. Liberation of Goa 
D 2. 1857 B. First battle of Panipat 
A 3s 1961 C. Death of Napoleon 
8 4. 1526 D. First War of Indian Independence 
F 5. 1688 E. Birth of Swami Vivekananda 


F. Bloodless Revolution in England 


The first example is inferior to the second example because in the first catia ot 
and Il are not homogeneous. In column | are listed the years with an eat int see a 
(First battle of Panipat) and in column Il are listed the names of events wit a wei 

second row (1526). Because the name of the event written in column | and the yea 
| ree 7 jistineui € two may be paired without real knowledge. 
column Ii are easily identified or distinguished, these t es y P nd theme ciile wert 

Hence, its improved form will be to write the year in column | a 
column I! as has been done in the second example. ra eet 
(d) The nature of the stems and the number of ec haa os ecseavanewa neta 
_evtoniy, Thies siguestion hasta mes ois He may be longer. Secondly, if 
he | “ en eepr guessing the correct pair are increased, 
€ matchin , the 


thus lowering the validity of the matching item. 





% g is 
W ral oth fi fr 

cy vin i Wes hi a 4 \ 4, 

rRe qvirt 

bs galaat f 


4 art : 
wns, there are other specifi LYpes 6 


| | fj 
formals ol tt al rests. They ret: ony | 


“Ol we nic 
eneral typ ction cy psys hologi 


prs’ pi? ie - the constru 


arly Uses 
re papular’ ® 
whch 4 tt 
i) The Lise 

i The Catego! 
\4 


jicts and Q-sorts sed in attitude and personality SCales. 
aii! Cheens -< a very popular format use 
II is A ‘e 
—. }ikert formal 
The Like 


peer th: 
indicate the degree of agreement With I 
sepondents are asked to indicate the ‘ a feinult becorce it ms | 

U the test Kero" “Thi format has been called as the L1 a found. Wistend Was y x 

fori, heacts. [huis format las . ion, In this formal, instea of ask; 

po certain attitude ; 2) method of attitude scale cons! ruct ‘cagkih APe generally te B fay 

part of Likert s¢O"-" ndents are given a set of statement: 

i ie »respongdenms ‘ 

reply, the resp 


L : ag! aL) WIV TFs Mo 2, | : 
i ol ive alt n allves: ire Mi iy ad 7 ree. i ] 


| 
ormal 


4 formal 


| vectively for positive it 
+d by giving weights like 5, 4, 3, 2 and 1 respectively for p eMs and 
options are score : t 









5 for negative items respectively. In some app Cart si options are used 

soe ies mer t from being neutral. In this case then, the response options will he 

ae om Voces disagree, Mildly disagree, Mildly agree, Magee alely agree and | 
= “ ace itt xargs ot items based on the Likert format are as under: 


| Response options a 
Statements 


Strongly agree Agree Neutral Disagree Strongly disagree 





i. Coeducation improves self- 
contidence ot the person 

2. Coeducation affects character 

development and morality 

negatively 









since responses in the Likert format can be subjected to factor analysis, psychometricians 
developing the test can easily find groups of items that go together. The scales also require that 
item analysis should be done to assess item discriminability, j 


The category format is another popular item format used : 
measurement, It is similar to the Likert format but here the number of choices or options arr 
greater than those of Likert. The most popular category format is the 10-point scale, However, i 


can have either more or fewer categories. Psychologists have concentrated u pon a fact that what 
should be the optimal numbs lave argued that the 


“1 of Category points or response points. Some | 
optimal number of points should be as some others have supeested 
hampney & Marshall 1939). The 


seven (Symonds 1925) where 
thal this optimal number should be 
pends upon the fineness of the discrimination 


three times this number (C 

reality 1s that the number of ¢ dlegories required de 

respondents are willing to make. If the topic is such about which the respondents are not family 

they will not be able to make finer discriminations and a scale with few Categories will do about 

as well as a scale with many points. However, when the lopic is such that the people are most 
y will tend to respond best to 4 greater number of Calegories, However for 
4 10-point scale see discrimination (Kaplan & 


Concerned about it. the 
Most topics or tasks, ms 10 provide enough 
— am i daly thon 1991) has also reported that a |0-point scale Provides the required 

ld oo ones fora wide Vaniely of stimuli. Evidences have also supPested that the 
ear gto d or We i may Not increase reliability and validity; rather it may 
*ponses may include an element of randomness when there 


Number oj ¢ 
JeCOUSe the 
af SO many such al “f ic 
y lernalives dmone which the respondents May fail to discriminate (Clark & | 


Watson 199 Ht). 
Another a 
| visual analogue scale. In this 
lre line with two well-defined end points 


in trait/ability/attitud 


Pproach which is y 


method, the FESPONAENt is presen 


ery similar to cate 


ws: BOTY scales is the 
ted with 


4 100-millime 








ind is asked to place 


a Mark betyyep 
according to the measured distance 


ftom Written 51 
N these lve 
1 Points (Ff 4 
lrorr the first pure 4.1), The scale 


end poi 
4 | mn lrorn or | nit t 


Tr i) the Mark » dife SCOoredd 
pin 
ea, | 
j h So —_ 
No feeling of burning ; 
sensation Xtreme feeling 


Of burning S€NSalion 


Fig. 3.1 The 

In Figure 3.1, the respondent h 
score for feeling of burning SEnsatio Analogue 

self-rated health. However, they are not used for multi-ite, 

unnecessarily time consuming (Clark & Watson 1998) om Scales 

Checklist and Q-sorts are anot 


nother Popular format, tn 
checklist is very common (Gout 


100-men Visual analogue scale 
a5 put a m; 

put a mark at 33 mm from the Origin and therefore, his 
MS 33. The visual are popular , 


for assessing 
because 


their SCOTINE is 


gh, 1960) Woe cde ey assessment, an adjective 
aie ' ‘ MN adjective c ecklist, the ; win 
long list of adjectives and he h ine respondent receives a 


45 to indicate whether or not | 
fe at each adjective js rig 
case. Thus in the adjective che JECHIVE Is true in his or her 


i : Cklist the respondent Is Fequired to either endorse such adjectives 
or not, allowing only two choices for each item. Such a checklist Is used either for describing 


oneself or someone else. In one study the raters rated the students considered exceptional in 
originality as ‘alert’, ‘curious’, ‘imaginative’ and ‘fair-minded’ whereas they rated : 

considered low in Originality as ‘confused’, ‘polished’, ‘prejudiced’. 
‘suggestible’. Q-sort is another similar technique which is used to describe 
ratings of others (Stephenson, 1 953). This is specially used in the study of 
preferences, self-description, etc, In this technique, the respondents are giy 


with the instruction to sort them into nine piles. One basic characteristic of Q-sort’ is that the 
frequency of items place in each category generally and usually looks like 4 bell-shaped curve. 


In this way, there are several types of formats of items and depending upon the purpose, any 
one is chosen by the researcher Of these various formats, multiple-choice, Q-sort and Likert 


formats are most popular in recent times. The other formats are less popular. Today checklists 
have almost gone out of use because they are more prone to error. 


the students 
‘conventional’ and 
onself or to provide 
verbalized attitudes, 
EN a Set of staternents 


DIFFERENCE BETWEEN ESSAY-TYPE TESTS AND OBJECTIVE 
Alter having discussed the different lypes Of essay-test items and ob 


essential that we must learn something about the differences between essay and objective types 
of tests. Following Ebel (1 965), the main points of difference may be enumerated as follows: 


|. In essay tests the examinee writes the answers in his own words whereas in objective- 
lype tests the examinee selects the correct answer from among several given alternative answers, 


In essay tests, the examinee is usually required to write down the answer in his own words. 
This gives him an opportunity to recall and express his ideas in the manner he likes, relate these 
ideas with some principles, and organize them into a coherent whole. He also ae ra 
Opportunity to give his own judgement. An examinee who can express himself well is, thus, likely 
0 score better than the others. However, a common drawback of essay-type tests is that a oe 
handwriting, fluent and graceful structure of sentences may hide some of the — A oe 
Contents of the answers and influence the examiner or scorer’s judgement vce is s Gaki 
other hand, any minor error in the structure of the sentence or flaw in spelling or usag 


: tei rardine th tent of the answer. 
unfavourably influence the scorer’s judgement regarding the content 


) i ity of fr ression of his 
In objective tests, the examinee is not provided with the any . sa ol 
Viewpoints, He js simply required to select the correct answer Irom among 


-TYPE TESTS 


jective-test items, it is 


The details of Q-sort appear in Chapter 12. 


\ieviae 'Je@ athe ee Py tig? ad 
= owe oo de De ee ee ee 


for free expression, and the answers are 


ans 


hus alse , 
' —— ests ive no opportunity es ad . : Pr. ftem Wriung 53 
oi A We some experts are of the view cere vei aaa Bea Meas, rest is short. When blind Buessing makes a higher contribyt; 
by the test constru 2 It is also said that the objective items ara Mere» "&q fhe fe" core, the reliability of the ine: Onteibution towards the make-up of the 
knowledge of the examinee. It ! - only. But these do | les, total test Se! iability of the ooo iderably lowered, If all jineine 

the tue nowleage cases, the tests of rote memory only. but tne NOt seen, to, he items of the test, the reliability of the score would be zerg 7" 4! examinees guess on all 
ere ee se objective-test items can be written In s UCH a Way as to st iMuly he A similar type of phenomenon Observed in th a 
ar omer oii oe which will be the basis for choice among the alternatiy, thy ~ ase Of essay tests 
originality critica 


: i, ed and written 4CCOrd} 
> }.* rae ifems, iT properly plann ' - in f ] 
Not only this, objective pao correct response simply on the basis of reCognitign © the 
suggestions given earlier, wi spires erbal associations. Ore 
the hasis of rote memory or on the basis of any v 


Thinking and writing are important in essay-type tests whereas reading and thin ne 
i = 


2 "8 ap 


ne acy ests. 
a : in aa. ice answers the questions in several lines. He critically th; 
7 ESSa) Ecol by the question, arranges the ideas in a sequence and EXPIesses | : 
al the prow eee ieee: thinking, particularly critical or reflective thinking, and te 
writing. ied dere Aes asin in writing dominate. In objective tests, the examinee does 
‘secu much and in many cases he is simply asked to put a tick mark. However, iN Or 
make the correct choice he is required to read both the stems as well as the alternative an 
very Carefully and then critically think and decide. Thus in objective-test items, reading ang 
reflective thinking dominate. 
4. 1s difficult to score essay tests objectively and accurately whereas objective tests Can 
be easily scored objectively and accurately. 
Essay tests are difficult to evaluate objectively, partly because the answers 
e answers to objective items, and partly because of the va riability in the sc 
sarding the contents of the answers. 
his individual preterences in scoring 
(ding their accuracy will also vary. | 
scoring can be done accurate! Vv 
five because when the answers 2 
7€nt (Or Objectivity) 


are Not fixed [ike 
orer’s judgemen 
This variability occurs because the scorer is freely guided | 
the answers. Since the answers are not fixed, judgemen 
n objective-type tests, whether of the selection or SU 
because the answers are fixed. The scoring will also be 
re tixed, there will obviously be complete interpersong 
among the scorers, 

In objective-type tests, the 


quality of the items is dependent upon the skill of the teg 
tor but in essay- 


(ype tests, the quality of the items js dependent upon the scorer’s skill 
ng items for an objective-type test is a relatively difficult task. Only a skilled teg 
4" Can write good objective items. The quality of the test items is bound to suffer if the 
ctor lacks skill in writing items or if he has limited knowledge regarding the subjec 
A test constructor even with a minimum 


50, 
N €ssay item in producing the desired resul 
re, often said that a good essa y item is not 
€ reader’s own skill in judging 


gement regarding the good quality of a 
he content of the answer. It is, therefn 


wortnwhileness of the content. 


€-lest items, no matter how well thay are constructed, permit and encourage 
‘aminees, whereas essay-test items, no matter how well they are constructed 
age blutting by examinees. 

€ test items, the probability of guessing cannot be fully nullified. The effectot 
ion of the actual score obtained on the test. Guessing is most obvious when 
IS short and the 'wo-alternative objective form js used or when difficull 
are included in multiple-choice items Or matching items and the length of 


xaminees find essay items difficult, 1 is bluffing. When 
e 


hey often try to rej 
ry to reinterpret them in 4a Manner which 

easier to sail be only ve, ie Concentrate more on the length and form of the dawer toy 

ing simple facts as well as by deviatin from the original ici; | | 3 

adding simple Arcades , 5 © Original idea in the hope that the reader’s or 
scorer’s sean will = aa irom the real and true content of the answer, Like guessing, 
bluffing tends te stk ss fie vy iurther tends to deteriorate the reliability 
ofan essay test. It may be reca ere that the reliabj lity of the Essay test is already questionable. 


6. Assignment of numerical scores ; items is entirely in the hands of the scorers 


: A Nn €ssay-test 
whereds scala eu of numerical scores jn obj €ctive-type test items is entirely determined by the 
scoring key of the manual. | 


Objective-test items are scored according to a predetermined scorin 


a g key prepared by the 
test constructor. Usually, for each correct answer a score of +1 and for anu 


100) and the maximum score to be given 
in any particular item and what should be the maximum or Minimum sco 
examinee depends largely upon the will of the scorer. Suppose the maximu 
item is 20. A person scoring the answer May perhaps find the entire answer c 

paragraph which he finds wrong for which 
score of 11. There is no rule for penalty. He may subtract even a score of 2 0 
at all. Likewise, he may assign a total score of 80 on the whole test or he ma 
on the whole test. Thus manipulation and assig 
entirely in the hands of the scorer. 


Despite the above-mentioned points of differences 
tests, they are not altogether different. The main points of 


{0 any particular item, how much score should be given 
re assigned to any 


m marks of an essay 


between essay tests and objective-type 
similarity are enumerated as follows: 

I. An element of subjectivity is involved in both objective-type as well as essay-type tests, 
In objective-type tests subjectivity is involved in writing the test items, in selecting 
particular criterion for validation of the test, etc. In essay-type tests, subjectivity is 
involved in writing and selecting the items. The most obvious effect of subjectivity in 
essay-type tests is seen in scoring of the essay items. 


é. In both essay-type tests as well as objective-type tests, emphasis is placed 
objectivity in the interpretaion of the test scores, 
must mean nearly the same to all observers or gra 
so, it means that the scoring lacks objectivity, thu 
Besides, the score must mean the same thing to th 


upon the 
By objectivity is meant that the score 
ders who have assigned it. If this is not 
s reducing the usefulness of the score. 
€ person today and several days later. 
3. Any educational achievement, such as the ability to spell English words, proficiency in 
grammar, performance in history, geography, educational psychology, etc., can be 
Measured through both essay-type tests or objective-type tests. When the intention is to 
measure critical thinking, originality, and the organizational ability, essay-type tests are 
Preferred. But when the intention is to measure piecemeal knowledge in any subject, 
objective-type tests are preferred. However, this line of demarcation is fast vanishing 
now because objective items have been used effectively for measuring achievement 
representing critical thinking and originality of the examinees. Likewise, essay items, 





in Behavioural Sciences 
i Research Methods in Behavioural 
raacgremenis ANE RS 
Tests. Measure ei | 
‘ hort-answer essay items, have been successfully used jin fits 
arly snom-all 


weir presenting piecemeal knowledge of any subject. 


achievement re 


ELINES FOR ITEM WRITING 

GENERAL GUID : iological and educational res 

= f items used in psychological, sociologica arch . 

The Oe cas tae wi effective items have already been discussed, Besides jd 

specific guidelines for writing effe in view some general guidelines, yp. hes, 

ales suidelines, the item writer must also keep in view Which at 

se iy writing good items. These are listed as follows. " ; 

_ 1. Clarity in writing test items is one of the main requirements ee - se a © Conside, 

ood Items must not be written as ‘verbal puzzles’. They must be . = * ; ee oo betwee. 
Souk are competent and those who are not. This is possible only on es nS Nive : 
written in a simple and clear language. The items must not be a test icles ) Hs S abil ‘ 
understand the language. The item writer should be very i wee ei: in writing the 
objective items because each such item provides more or less an isola ee ance a edge, an 
there the problem of clarity is more serious. lf the objective om is - Ssiaa 4 will Create 
difficulty in understanding and the validity of the item will be adversely » VaBUENegg | 


: eer | inking and incom . . 
writing items may be because of several reasons, such as poor thinking a petence of the 
item writer, 


2. Nonfunctional words must not be included in the item as they tend to lower the Validity 


of the item. Nonfunctional words refer to those words which make no contribution lOWard the 
appropriate and correct choice of a response by the examinees. Such words are often included 


the item writer in an attempt to make the correct answer less obvious or (0 provide , 
good distractor. 


“Uti | 


3. The item writer must make sure that irrelevant accuracies unintentionally INCOrporated 
in the items, are avoided. Such irrelevant accuracies reflect poor critical ability to think on the 
part of the item writer. They may also lead the examinees to think that the statement is true. 


4. The items must not be too easy or too difficult for the examinees. The level of difficulty of 
the item should be adaptable to the level of understanding of the examinees, Although it is a fact 
that exact decision regarding the difficulty value of an item can be taken only after some 
statistical techniques have been employed, yet an experienced item writer is Capable of - 
controlling the difficulty value beforehand and making it adaptable to the examinees. |n certain 
forms of objective-type items, such as multiple-choice items and matching items, it is very easy to 
increase or decrease the difficulty value of the item. In general, when the response alternatives 
are made homogeneous, the difficulty value of the item is increased but when the response — 
alternatives are made heterogeneous, except the correct alternative, the examinee is likely to 
choose the correct answer soon and thus, the level of difficulty is decreased. The item writer must 
keep in view the characteristics of both the ideal examinees as well as the typical examinees. If he 


keeps the typical examinees (who are fewer in number) in view and ignores the ideal examinees, 
ult ones, 


the test items are likely to be unreasonably diffic 

9. Use of stereotyped words, either in the stem or 
avoided because these facilitate rote learners jn guessing the correct answer Moreover, such 
stereotyped words fail to discriminate between those who really know anil unde t ‘d the 
subject, and those who do not. Thus, stereotyped words do Not provide an ad a d 
aserinnmiatory measure of index. The most obvious way of getting rid of i h ee 
paraphrase the words in a different manner so that those who reall know t oe Ord 5 
the meaning. y know the answer can pick up 





in the alternative responses, must be 


6. Irrelevant cly 
clang association, y 
homogeneous fo} 


es must be avoided . These are s 
erbal associat; 


sl t th Races 
Is, giving the oe | © answer, keep 


€ order of the ¢ NB a different foil among 


/ tc. In general, such Clues tend 


iif», 


ee 


hem Writing §5 
to decrease the difficulty level of the 


answer. The common observation is that examinees 
any of these irrelevant clues and answer on th 
special care to avoid such irreleyant Clues, 
also be avoided because they are alse 
alternative items. 


tem because they provide 


who do not knicy 
al basis. The Hem 
: Specific determiners like 
relevant clues to the corr 


an easy route to the correct 
W the correct answer, choose 
writer must, therefore, take 
never, always, all, none must 
ect answer, especially in the 

(wo- 

: interlocking ems must be avoided. Interlocking items, also known as interdependent 
items, are items that can be answered only by referring to other items. In other words, when 
responding correctly to an item js dependent Upon the correct response of any other itern, the 
em constitutes an example of an interlocking or interdependent item. For example: 


1. Sociometry is a technique used to Study the affect structure of groups. 


True/False 
2. Itis a kind of projective technique. True/False 
3. It was developed by Moreno, et al. True/False 


The above examples illustrate interlocking 
iven when the examinee knows the correct an 
because they do not provide an equal chance for 
Recently, Devellis (1991) has 

are worth considering. Six of these 


items. Answers to items 2 and 3 can only be 
swer to item 1. Such items should be avoided 
all the examinees to answer the item. 


provided some encouraging and stimulating guidelines which 
guidelines may be presented as under 
(i) Whatever is going to be assessed should be defined 
should use substantive theory as guide and should try to make j 
(ii) Test developers should avoid 
good. 


Clearly. For this, test developers 
lems as specific as possible, 


exceptionally long items, which are rarely considered 
(iii) The level of reading difficulty should be appropriate for those who will take the test. 

(iv) Test developers should develop a pool of items 
one that will finally be used in the scale. They should avoid redundant items. Although 


theoretically all items are randomly chosen from a universe of item contents, practically, 
however, Care in selecting and developing items is valuable. 


by writing three or four items for each 


(v) ‘Double-barreled’ items should be avoided. Such items convey 


[Wo or more ideas at the 
same time. 


vi) Negative and positive items should be mixed. This is needed to check up ‘acquiescence 
response set’ which indicates that the respondents develop a tendency to agree with most items. 
To check or avoid such response bias, the test developers should write the same items in ihe 
negative direction. For example, in writing items about depression, the positive item would be: “| 
feel sad” whereas the Negative item would be: “I feel hopeful and delightful.” 


Apart from these broad suggestions, test developers should keep in view that with ne ol 
lime, tests lose their half-life and about 12% of items become less reliable over this nies a, 
Drasgow & Sawin 1999). In general, those items tend to lose reliability over time which are 
focused on the meaning of a concept. 


GENERAL METHODS OF SCORING OBJECTIVE-TEST ITEMS 


Sa ere are several 
Scoring of the test items is one of the important aspects of sal i sion eh selecting : 
methods for scoring the responses. The following factors must be kept in vi a 
method of scori ne: 
|. It must not take much time. alae 
2. It must not be such that only a highly trained person can adapt tt. 


ee ee ae DJEctivity here Means j 
, r © Interperson 


different scorers. If different Persons are scorj 
must be such that all of them arrive at the Meiee a pies lest, the Meth “Mon 
a! Conclusion, Od of sce My | 


a the method i scoring should produce Maximum accuracy, iy, 
The following are the important methods of scoring objective test ite 
7 ms, | 


Overlay-key Method 
In this method, a cut-out key is pre is 
, Pared, that is, a Window whi 
item on each Page iS prepared. The correct answers are = on a er 
Subsequently, the key is overlaid “Upon each page in such 4 man ined JUSt below, ‘to fac | 
| nanner t 


strip, they are j ; : 
ee con en an adjacent vertical column on the left of the respective ; 
ee : score Is the sum of the points given for each correct answer. eve ete 
O far as th : eins : 
Eonecues sae Of essay-type fests is concerned, no such key-method can be deve| | 
- er essay items can be scored lik biect; —— oe VElOped | 
the length of the ans IKE Objective items if a certain planni. 
; wer, the type of factual j lation t . Planning like | 
givertpeach wpa, mech ar a! information to be included, the Weight or points to ie 
ith Gececeaa = : ers, elc., is done beforehand. A similar planning can also bed 
g © extended-answer Essay items, but how far it will be successful is difficus | 
Cult 
| 


= MI Review Questions | 
: a aoa vs item. Also discuss the rela live advantages and disadvantages | 
2. ey between CsSay-type tests and objective-type tests. Cite appropriate | 
z = — mah = precautions that should be taken in writing test items, | 
bis a eee a a of scoring objective-test items do YOU consider best? | 

5. Discuss the advantages and limitations of the different types of objective items 








4 
ITEM ANALYSIS 


Meaning and Purpose of Item wane eve 
Power Tests: Item Difficulty 
Index of Dictimination 
Distractor Analysis 
Optional Difficulty Value for a Reliable Test 
Index of Discrimination 
A Simplified Item Analysis Form 
Effectiveness of Distractors or Foils (Distractor Analysis) 
Speed Test 
Index of Difficulty 
Index of Discrimination 
Factors Influencing the Index of Difficulty and the Index of Discrimination 
Problems of Item Analysis 
Important Interactions Among Item Characteristics 
The Item Characteristics Curve (ICC) and Item Response Theory 


MEANING AND PURPOSE OF ITEM ANALYSIS 
After the items have been written, reviewed and carefully edited, they are subjected to a 
procedure called item analysis. Item analysis is a set of procedures, that is applied to know the 
indices for the truthfulness (or validity) of items. In other words, item analysis is a technique 
through which those items which are valid and suited to the purpose are selected and the rest are 
either eliminated or modified to suit the purpose. For the purpose of selecting the most 
appropriate, valid and discriminating items the individual item performance by a group of 
examinees is Compared to their pertormance on the whole test. In brief, it can be said that item 
analysis demonstrates how effectively a given test item functions within the total test. As a matter 
of fact, each item is an element within the test and in this sense, it is a test within a test. The 
validity of the whole test is dependent upon the validity of the individual items. Item analysis is a 
set of procedures that provides us with the estimate of validity of each item. The main objectives 
of item analysis are enumerated below. 
1. Item analysis indicates which items are difficult, easy, moderately difficult or moderately 
easy. In other words, it provides an index of the difficulty value to each item. 
It also provides indices of the ability of the item to discriminate between inferior and 
superior. In other words, item analysis indicates the discrimination value of each item. 
This is known as item validity. . _— . 
. It indicates the effectiveness of the distractors in multiple-choice ‘items. sore 
multiple-choice items are one of the most powertul and flexible objective sca 
‘ 5 utilize this form of item, a thorough item analysis is done 
many of the standardized tests utilize t it Oe iat 
to indicate the extent to which the distractors or foils are effective in ae feactiy, 
cede articular item in the test has not functioned effectively 
It sometimes also indicates why a particular item in 


| and how this might be modified so that its functional significance can be increased. 


57 


3 


yy 


58 Tests, Measy rements and Resery 
é se an hy A fethy , is m Ry yay ‘fi werral Ay hi "Fic o's 


POWER TESTS 


A power test is 
| Ffest is one where , 3 ‘ring all items 
test, Thus. enanhactes ere the examinee is allowed sufficient time for answer'ng all items Of the 
>. &MPNasis here is upon measure, noawer) of the examinee ang 

Upon his speed. Ideally. j leasurement of the ability (or power) oF Ct Noy 
creasing order of difficulty. . 
The two m | « index ot diffic 

arid the inday = common indices which item analysis reveals are the index al sffe cu y Value 

dict RN OF discrimination value. Item analysis also aims at know!n& the SHECUVEN Gs: 

ractors in multiple-choice items, This part of item analysis is known seule acon analy 

Malan different methods of calculating the difficulty value and discrimination value of ap 

item. A detailed disc ussion of these methods is presented below. 


ITEM DIFFICULTY 


In item analysis, the first step is to find out the difficulty value of the it 


of an item. The difficulty value of an item is defined as the proportion oF pereemer a the 
This proportion or percentage Is known 


examinees or individuals who answer the item correctly. 3 . 
as the index of difficulty of an item. If an item is answered correctly by 90% of the ipehd at caiee it 
obviously means that the item is relatively easy and is not discriminating well at the baat bottom 
of the scale of the trait being measured, as it is discriminating only in aX 10. com Inations or 
pairs or examinees. Likewise, if an item is answered correctly by only 3% ot the examinees, it 


i BR eh nF | ine | 
means that the item is a difficult one and is not discriminating well among the oan ia ideas the 
ong superiors and genius 


% examinees has no 


em or the index of difficulty 


higher level of the trait being measured or is discriminating only end 0 
examinees. Also, an item which is answered correctly by 100% and Uo exe! | 
The maximum number of possible discriminations between 


differentiating significance. if ly by 50 
examinees of any item is 50 x 50 =2500. This occurs when an item is answered correctly Dy 50% 


of the examinees and answered wrongly by 50%. But this does not mean that the tests composed 
of items of only 50% difficulty level are always to be preferred because such tests would simply 
divide the total examinees into two groups: those above the 50'%o difficulty level and those below 
the 50% difficulty level. Such items would not differentiate between examinees in a group above 
or below the 50% level of difficulty. The proportion passing an item is inversely related to the 
difficulty of an item. The higher the proportion or percentage getting ne ver get es ve 
higher the index of difficulty) the easier the item, and the lower the proportion getting the item 
right (i.e., the lower the index of difficulty) the more difficult the item, If a test is Composed only of 
items of higher indices of difficulty or only of lower indices of difficulty, it will never be a good 
measuring instrument. Any test to be called a good measuring instrument must have some items 
of higher indices of difficulty which should usually be placed at the beginning of the test, some 
iterns of moderate indices of difficulty (ranging from 40% through 60%) which should preferably 
appear after that, and some items of lower indices of difficulty which should appear last of all. At 
the beginning of the test can be placed those items which are passed by 100%, while at the end, 
those items can be placed which are passed by no one or zero per cent. Thus the test items must 
have a normal distribution with respect to their indices of difficulty, The reason why items of 
moderate difficulty indices (which usua lly range from 0.40 through 0.60) are preterred is that 
they signal the maximum variance (or variability among the examinees) Variance is lefined as 
sD°* and the standard deviation (SD) of an item whic h is scored eithe 1 so leeannl wee 
where p ts the mean of proportion passing an item ; eiebiaioan vi rs Nia : i ea vp4 
eis alia Passing an item and g the proportion failing an item, There is a 
general Consensus among experts that fey the sharpest discriminatic ; sail os ike 
index of difficulty should be 5/ % OF 0.50. If p of , | ork. eal, Ware ne 
50% oF 0.5. Thereore is se 20. f man tern 's 90% or 0.5, qis also, then, naturally 
variance which js equal to 0.25. Direct it a sii ase WIG, when quared, becomes the 
top xqinwhichp valueicr, a = ie (ee be said that the variance of an item is equal 
vomous item and q is equal to 1 ~ p. Variance is 


fer Ancalpsis 59 


related to p and itis the largest when pis close to 0.5. Thy 
can yield Is 0.29. Maximum variance indicates m xi us, The Maximum variance that 
B shal sand wito pass and fail, For ex atviGhe: Ho. alla = : hay! iy or 
soup © (100 Sta th : elecrlininate between 50 who pass it il een pa : The 
F tal number of comp: OS ere ore, equal to 50 «50 =2500.1 ihieawies, if the — : : 
by 60%. the eas a dock that this Hem is capable of yielding is equal Io 60 z “ ope 
= the best item In ne of Cerrar SON OF variability is the itern whose inden af difficult is 
O%. AS the index increases or decreases, the Variance of the item Bradually beveled 
zm my = at as : © # 


r5 
0.50 OF - nF ke comparisons 
bility to make Parsons among those who pass and those who fail decreases. 


js, its 4a 
that is, | : 
gasically, there are two important methods of determining the difficulty value of an itern: the 


ied a een and the empirical method. A brief discussion of each of these two is 
elow. 


direc! ly 


an item 
an tions aren 


sepa ra 


met 
presen te 


method of Judgement 

js method, the GiMficury value of an item is determined on the basis of the judgement of 

rts. Items are aaa to a group of experts with the instruction to rank the items Pa their 

increasing order of difficulty. Subsequently, the test constructor takes a final decision, keeping in 

view the commonality of ranks assigned to each item by the different judges or the experts. 
sometimes, an estimate regarding the index of difficulty is made by the test constructor himself 
on the basis ot the total time taken by the examinees in its solution. The demerit of both these 
types of procedure is that they cannot be fully reliable and objective. 


Empirical Method | 
The empirical method, also called the statistical method, is the basic and scientific method of 


determining the index of difficulty of an item. There are two common statistical methods through 
which the index of the difficulty can be estimated. One method determines the index on the basis 
of the response of all the examinees whereas another method determines the index on the basis of 
the responses of only a portion of the examinees. Usually, the correct answer in the test is given a 
score of 1 and the incorrect answer a score of 0. Such items are known as dichotomous items 
which are different from multipoint items where different points are given as scores, e.g., a score 
of 5, 4, 3, 2 and 1 may be given to the response categories strongly agree, agree, neutral, disagree 
and strongly disagree respectively. A score of +1 on the ability test indicates that the examinee 
has passed the item and zero means he has failed the item. On the nonability test (such as on the 
test of personality, attitude, etc.), a score of +1 indicates the highest score and a score of zero 
indicates the lowest score on any given item. For a dichotomous item where correct answer is 
scored 1 and incorrect answer is scored 0, the index of difficulty can be determined by the 


following formula: 


in th 
expe 


R 
a (4.1) 
PON 


where p is the index of difficulty; R is the number of examinees who pass the item (or correctly 
answer); and N is the total number of examinees who take the test. 

The difficulty or D values are important for two primary reasons. First, D value directly 
influences the shape of total test scores as well as their standard deviation. If the test consists of 
items of higher D values (i.e., easier items), the distribution of scores would be negatively skewed 
and the standard deviation would be small whereas if the test consists of items of lower D values 
\Le., difficult items), the distribution of total scores would be positively skewed and the standard 
deviation would again be small. Since it is desired that the distribution of the total scores be 
‘ymMmetrical and the items disperse examinees as far as possible, the test should preferably a 
Of items having D values close to 0.5. Such items will procure symmetrical distributions as ‘a . 
the larger standard deviation. A test which consists of items having D values close to 0.5 (that is, 





a ACSCaKCA Meghy wats th} ie Fy inte inal AN Os 


in the ran ; . 
the test wk e ot (0 0.6), is known as a peaked test. Second, values influenc & the relia 
| s ich'is Statistically defined as the self-correlation of the test. If the items have ay "Hi 


Values su Tas ‘ an) el a ‘yl t S Al Pipa Vey 
such as 0.2, 0.1, 0.8, 0.9, etc,), they are not likely to corre late with each Other, thar. 


inter-j : . : +1 foe af ; ‘h 
fem correlation wil] be very low. As a consequence, the test will have a lower jth 


annie reliability coefficient. If the items, on the other hand, have D values of Middle em 
‘ 


perfectly. Natura , 
internal Consistency reliability coefficient of the test. Thus, good items are (hose Which 
difficu Ity values near 0. 5 because they correlate moderately with each other. Thi S Corre lation Qe 
another implication, The correlation of an item with total scores On ay is directly relate, as 
the correlation of a particular item with other items In the test. ba ringe high interes” 
correlations are expected to correlate with total scores of the test. u us re values not ar 
affect the reliability coefficient of the test but also the item-total score corre atjon, . y 
The index of difficulty can also be determined from a ata pomion ‘a the 5'0up of 
examinees. If the number of examinees is ranked from highest Ree: HEA WatieS ay hs SIS Of the, Otay 
be called the upper group and the lower 27% may be calleat : Oo ‘ gr Ip. Me 5, rom the tota, 
number of the examinees the middle 46% are set aside and only ant e athe 4% the inde 6 
difficulty of each item is determined. The index of difficulty eeniien On Ine basis Of these Wo 
extreme groups is almost the same when the entire sample . i CRA INCES: 15 taken into 
account. The formula for determining the index on the basis of the extreme groups js Biven 

below: 
R, +k, 


P oN. 4 N, (4.9) 





where p is the index of difficulty; R,, is the number of examinees answering correctly in the Upper 
group; R, is the number of examinees answering correctly in lower group; N, is the number oj 
examinees in upper group; and N, the number of examinees in the lower group. 
To illustrate, suppose N, = 200, N, =200,andR, =150andR, =50fora particular item ina 
test then its index of difficulty will be as follows: 
_ 150+50 — 200 


= —_—_—_ = —- =) 50 
200+ 200 400 


There is a still easier way of determining the index of difficulty of an item when two equal 
extreme groups have been set up. On each item, the number of correct responses in the upper 
group and the lower group is counted and converted into proportions by dividing them by the N 
of the corresponding group. Thus for each item, the correct proportion in the upper group and 

correct proportion in the lower group are known, Then, by the simple process of averaging the 
two proportions, we get D. For example, the number of examj nees giving correct answers in the 
upper &roup 's 150 and in the lower group it is 50 only. Con verting these two into proportions, we 
get 299 = 0.75 and 5, =025 respectively. Averaging these two proportions, we get 


(0.75 + 025)/2 =0.50 


vin al method of determining the index of difficulty of an item is particularly 
where t le index of discrimination of the item is also to be determined on the same 
treme groups. This saves labour and time in item analysis. : 


It has ob: ) ; 
been observed that sometimes examinees answer correctly on the basis of guesswork. 


Jessing not only j 
2 y inflates the actual score but also introduces a measurement error. The 


bability of Suessing in any objective-test items cannot be ryk ut c Netely. In the 


sis i 

(e.g., from 0.4 to 0,6), they will be expected to correlate with each other though Not Derpa"8e 
7 1 E 2 : . 2 (a r ah ti iy | : lips 

ecause only items having very close D value of middle range, are Fie Loc; Meet 

lly, then, items having D values of middle range are likely to enhang on 

Vp 


tem Analysis 61 Wh 
ive item, the problem of puessing j ' ee aany 
rr uessiNG Is automatically minh Saretone a sapien multiple-choice daggldalas \ 
‘test. and increase the index of dice. OF this Buessing is to inflate the actual \ 
core on the a ndex of diffi ully. The AMOUNT Of increase in the index of 
jolt" ey is inversely related to the number of alternative responses for each ite . Tt | srpe th 
qithic' lives, the poorer is the increase in the Index of difficulty dic me bisior Ghianee oe. i | 
A f guessing It is, therefore, advised that a ¢ Orrection for chance SUCCESS or guessing should | 
rasl® roduced and then on the basis of the Corrected score, the difficulty value should be 
€ vained- In applying correction for Buessing, it is assumed that the examinees give wrong 
ee ue to absence of real knowledge and for such examinees all response options are 
anew iy attractive. If the index of difficulty is being determined on the basis of the upper group 
en wer group, it is proper first to correct the proportion of the correct answers of the upper and 
and '0 ups and then determine the index ot difficulty or discrimination. The formula for 


7 roup i 
que the total score for chance success is given below: 
; Ww 
Se p= rae (4.3) 


e Sreters to the corrected score; R refers to the nu mber of correct responses; W refers to the 
Mt ak of incorrect responses; and K refers to the number of response options or choices used in 
I ner One assumption of the above formula is that all incorrect responses result from 
- ae Examinees are expected to either get an answer correct or omit it. To illustrate the 
es utation from the above formula, suppose an item js correctly answered by 300 examinees 
con 400, and the remaining 100 answer it incorrectly, and each item has five response options 
veins one score for each correct answer. Instead of a total score of 300, now the corrected score 
on that particular item will be 300 - (1 00/(5 ~1)] =275. Accordingly, P from 3,. = 0.75 will be 
reduced tO ?7%499 = 0.6875 =0.69. Likewise, in the example cited above where N,, and L, were 
used, the number of R,, was 150 and R, was 50 on the said item. Since 150 examinees in the 
upper group answered correctly, It means 50 examinees answered incorrectly and 50 in the 
lower group answered correctly, which means 150 answered it incorrectly. Further, suppose that 
the item had six response options. Now, the corrected score in the upper group and lower group 
will be as follows: 
S = 150 —[50/(6 —1)] = 140. 


So, the corrected score of K,, on the said item is 140 instead of 150. 
S = 50 -[150/ (6 —1)] =20, 


The corrected score of R, is 20 instead of 30 on the same item. Accordingly, D which was 
0.50 will be reduced as shown below: 


Ry +R, _ 140420 160 _) 4 


PN, +N, 2004200 400 
Thus, the corrected p is 0.40 instead of 0.50. Guessing can also be minimized through some 
nonstatistical methods. The examinees should be given a stern warning against guessing so that 
they resort to guessing the least when they do not know the correct answer. The distractors of the 
items should be made attractive to those who do not know the correct answer; and finally, every 
examinee should be given sufficient time for answering each item of the test. 

The purpose of introducing correction for chance success or guessing is manifold. First, it Is 
done to discourage guessing among the examinees. Second, it is done in order to give an 
‘PParent advantage to the examinees who don’t guess over those who guess. Third, it makes the 
‘core meaningful and dependable. Fourth, the correction for guessing makes even those items 
‘omparable directly which have a different number of distractors. 


' . vise Of ITS ASU O ye 
iy al | | rite ion te ™ 


what it intends to do qy Me 
re Jw nas la jails tO oa aye) ots Twe f that 
seer OT ann ated Fa ase UX undame 
qe TH BOCK sity he rorrmala | 
The core wrane remy ON 
4 yal 
it f ru : 4 
chet r¥ oe pew el US, 


ait d and each 
nees are lin PL 
rhe pram Son, 


howe } . 
QueSsiNg, In other Wore, 


: ion.Or partial iINforMation 
5 the first assuMplONn reveal, , 
| 
cated guess based 

at levet and educated £ On Parti) 
re really happens there * need 10 by 

Tih i , \ - ; | . oa 4 
myctt elemanate SE FRITS assumption expresses the POssibjy; 
oe iv hw i sep misinformation, partial know! Ct 
pate 3 road test constructor generally prov, 

hat 4 to attract examinees with paniay . 


eae wT 
hanger Mes 
that an earner - 
nearest ws ire Ociot 


ares TR KY Et spate 
ayer mat 4 arong a 

‘eon the reality © 

Csi % 

urect answer! SO 2 

leach ‘wrong TF urralat 10 2 CORTEX 

erect EntceTra er undercorrection and some technical factors are ak 

Rede thesr the problem oa ut Ig On 2 multiple-choice test with altemative 


ov pues how ones | 
nancies’ woth conecivr tr Ato the extent an examinee Can eliminate some Options 
ia mma reer ORT . 


' oe ion is eliminated, : 
ne number af optrons. If even one option | BUEssi 
TE ee ee eae ote On the other hand, some examinees who are not Much 
« libeh to help the exarenee + & “’ nen intimidated by knowing the fact that 
tamihar with knowledge of Obpective les cali ed and due to this realization, they may keep 
the comecbor KF purine formula ie 
af oO fe work "4 - | +. OF wp 
themrives oul of guessing of the correction for guessing formula is that it fails to 
Ancthe empotant charadiernsi« of > einen m | es 
wants ‘items are omitted. In such situation, there 
change the rank ofder of The examunees, no Questions/ Hem: oe Wik 
will be 4 petect cometatice | 00) between the corrected scores and the raw scores. When there 
are some ommied reyponses the coctiicrent of correlation becomes less than 1.00 and even jn 
such situation © remaens on 0 90s ¢ there ts not much variations in the number of items omitted 
(Bri 1979) As we know, relubslity and validity are increased by correction lor guessing only 
when there m2 paws correlation between raw score and omitted responses (Ebel 1979). Of 
Couric, such Cocumstances normally don't occur because the examinee who answers most 
Questions and works quickly 5 also more knowledgeable and testwise. 
Some expemts have suggested that intercorrelations of all items should be computed because 
f goves an indicatson regarding the distribution of item difficulties, which, in turn, helps in the 
process of ters selection. Under such curcumsiances, the index of difficulty of the item is not 
Considered separately as ¢ atomatically Contributes to inter-item correlation. The best index of 
Hern Intercoretation 1 the phir 


oetticwent d the Mems are scored as either + 1 or 0 When | 
| : : r 0. When items 
are teatly equal un GiMiculty, ther wntercorelation as measur 


therefore. reliatlity os SUCT 2 West ts tue), When items Vary | : cha ae : ae a 


optam the tormy 


an ‘ary in the indices of difficy| , such it 
don't yield hegh iterCorrelations and this yields a low reliability for the test eee 
As Gated earlier term hay | ee 


ike, only the most 
al. A test composed of 
lability of the test would 
Most preferred ones. Likewise, in 


Buarantee a high correlation with 


€ of 0.5 or near th 
hence. the relia 
5 are the 
not always 


| . however, D value rd 
er Cases the D value of 0.5 of near ieee 


te” ging the PROPOFONS Of the « 
yw ay Miners, ane in the per PP, 


} io in which item 7 has iter, ten 
aed 


ores When, for earyple. the ir Hem Ariusirnis 

Cit | tern | 

A the: the upper 7, eee 

e* | e irr i pet iy 

gers comet is alsa 9 Tsar that the it iehony 7 | a. 

| TT be ‘ily f) 

al tel ties, ri foopy —s 

s | 


| determines by 
a, : : Oe) 1) ye be reer Se hoone E| or 

" 1 ate] in thie le nert LT Te the 
; i ted ate m fOtry ld fable 
my thie od th. an a et Water) ahirve | 


optiMAt DIFFICULTY VALUE FOR 4 peti, 
the test Constructor i« 


BLE TEST 
erally. fae worl veith 


tee f ery , ; 
iIty of the items be? The question ¢ an we hewn What 


wiasiel Poe | 
, ‘ ‘AA imal lo ] 
Aiffict jifficulty. € Jbviously, most te , IVETE bay o ferrirng ter what ie ete revel ef 
| * ~ -—* . rie : i i ; 
level ot or | . MT tees, rary hat ry Coplay 
hich duces MAXIMUM HEM interc ore. 
whi 


qa ornal | ga | at tee 

rP ve . lations, ane] t tobny rel rp litte ulty m 
also 3 higher reliability of test scores, So: thes he 

a 1 L 1 e P 


of the test scores. ree producing higher reliability. are th, ye with the reliability 
giscussion on the index of ditfic ulty, it was Pomted oye that the Oe ake oe lenyarts, in the 

which are very Close to that. This CONClusION should neg : en A ditficulty 15.0.5 of 
ale. When the number of items in the tes is 30 1. Tiare) 


ver, be taken as a general 
OF less, tems hay y 0.5 
“toe to tt yield higher intercorrelations and/or item-toral laine it ache an 


higher reliability. But in the longer tests where items saptemrensny —— also yield 

difficulty are preferred though they do Not yield a higher INteT—em cellaine ns mR 
tems, indexes of difficulty concentrating at 0,5 are noe good because this would timo! ‘éieiin 
that items were SO difficult that every examinee had guessed. A tess Consisting of only wich inte 
would have a zero reliability Coefficient. Likewise, a pesonabiy Seittemenertctre 
items having indexes of difficulty at 0.5, is nota BOod test because it would mean that tems cian 
highly ambiguous. It Is, therefore, recommended that ‘true-false’ items or ‘agree-disagree’ items 
should have P value above 0.70. Lord (1952) has Suggested that the optimal level of difficulty for 
two-alternative items is 0.85 (uncorrected), for three-alternative items is 0.77. for iour-alternative 
items is 0.74 and for five-alternative items 0.69. | 


INDEX OF DISCRIMINATION 


The determination of the index of discrimination also known as the item validity index is another 
important aspect in item analysis. The index of discrimination has been exhaustibly described by 
Mosier & McQuitty (1940), Johnson (1951) and Ebel (1965), Since then. it has become the vogue 
to determine this index in any item-analysis work. Index of discrimination is that ability of the 
item on the basis of which the discrimination or distinction is made between superiors and 
inferiors (Blood & Budd 1972). Bean (1953, 153) has detined this index as “the degree to which 
the single item separates the superior from the inferior individuals in the trait or group of traits 
being measured.” Marshall & Hales (1972, 81) have said, “The discriminatory power or validity 
(V) of the item may be defined as the extent to which success and iailure on that item indicate the 
possession of the trait or achievement being measured”. From the point of view of discriminatory 
power, all test items can be divided into items that are either (a) positively discriminating, 
‘b) negatively discriminating, or (c) nondiscriminating. A positively discriminating ilem may be 
defined as one in which the proportion or percentage Of Correct answers is higher in the upper 
bfoup. Anegatively discriminating item may be detined as one in which proportion oF percentage 
of correct answers is lower in the upper group. Likewise, a nondiscriminating item may be 
defined as one in which the percentage or proportion ot correct answers Is equal 7 
approximately equal in both the upper and lower groups. The negatively sg ie — 
well as the nondiscriminating items are dropped aiter item analysis because they do nc 

Positive Contribution to the overall functioning of the test. 


ie 
(Ti = 
Stiles Ta] flerrinn sl Corte lation 


AA eT a] Leven haa diteet 








ud | 4 
dc tah Une Batieee BNR, SURFS 


ALLA itv ines ol CSC TIMIN ALON: 


ha, ays os 
wifkwnee® of the diftvrence belween two Prop 
ok AM SEQEVTIN On| 
~ i 
Wy 


cual tes DiDiARES 


wh) eM Pee i 


4 hae 


pethoadts al determining the item validily is piven bit 
| ‘Toy 


«the Difterence befween Two Proportions Or Percentages 
ea Ve es 


— sas 2 co 
teal he valntty index ofan item, the examivees are divided pre 
the, 


leraby, 


wi Ps his a the natal sue, 
.% — [ 4 5 


Stake tap and bottom 40% or 50% and compare the Proportig 


hy, SALE a hal 


meng” THAN? € Mis ahh. & VME, But the common observation has been that : Oj 


| Wh 
“ +s, . i if 
= os the mutt ity 40% top and bottom or 50% top and bottom), je” 
“ eo at fhe Gea Tey wrnATA index. Kelley (1939) has demonstrated that When the 
,_=, % = ‘ = * F A a * + = . 
; os yok 2 and bottont 27% In a normal distribution, the ratio ,, 
. % % ro ‘ a . 


ae -_ = sete ee? fe CTS UN the fw groups over the standard error of the difference bety, | 


emcees wees Maat es the cmc al rato) ts at a maximum. Kelley has, therefore, recommen 
ace Nshoweld be 8700 so that in the upper 27% there will be 100 examinees 35 
, sa. phen will also be 100 examinees. When the distribution is not normal, that j. 
Dum yet Tre ee One the aphimium percentage for the extreme groups ‘ slightly higher than 
~e ems aches apewoxunately 33% (Cureton 1957), Many experts have been influenced 


‘n< jeneee & k@lkes and since then, it has become a fashion to use the upper 27% and the lower 
uy Geteesumene the validity index of an item. The additional advantage in this approach ic 
mat oe he seme exes croups. the index of difficulty can also be calculated and thus, both 
seme ged labour Can be saved in the process of item analysis. 
making two extreme groups, the total scores are arranged from highest tg 
men the top 7% and the bottom 27% are selected. Thus, the total score is the 
sthemor Sr muriome these two extreme groups. When the proportions of examinees Passing jn 
fe amwer cecum &_ | and in lower group(R, have been determined, critical ratios can be applied 
+» RAE Se See cance of the difference between the proportions of these two groups Compared 
.t See oan Standard errors. If the difference comes to be a significant one, the item js 
acoeete as Ome whech discriminates. The demerit of the critical ratio is that it only indicates 
eeeer o¢ Sot the eiven item discriminates but it does not tell how well each item discriminates, 
fat < te amount of discrimination power of each item Is not revealed. 


[oe os Oo 


we FL 


Syheed (7954) has recommended the use of the chi-square test as a measure of the index of 
Sscomenaton when there are equal numbers of examinees in both the extreme groups. He has 
sessed 2 special reduced formula of chi-square for this purpose: 


»_ N(p, - pb 
x° = ——— 
4pq 

ehete \ refers to the number of total examinees; p, and p, reter to the proportion of examinees 
sassime of the upper group and lower group respectively; Pis the arithmetic mean of p, and p, 
anc gs equal to} — p. However, the use of chi-square as a measure of the index of discrimination 
"as some limitations. The chi-square can be used only with larger samples and with items falling 
© the middie range ot attractiveness for the examinees. Moreover, the chi-square test only reveals 


whether or nat the tem is discriminating. It gives no idea of the amount of discrimination of 


(4.4) 


Marshall & Hales (1972) have sugpested a very simple and quick method 
index of discrimination. They have called this index the “Net D 
have defined Net D as “an unbiased index of absolute difference 


of determining the 
index of discrimination”. They 
in the number of discriminations 


terms Avgeal yds tr 


te petween the i] I | were RT ‘ine thie levayerp Bri : 
yl jren between thet opie tl iv ee 
LD sh ihe er With Breit yy Like ther g OPH otticinal te the met cigesinn: 
Tila ands the setting Up of two ex) Te Hele fies Lise! at PCC AL tats 
i180 sil ing of the bottam gry (| Ce ROUTE ory Citvinntan te the Net D method 
‘ her i ornslil hi i b : od aii ‘is 4 thie PCT None, hie aa + ri Fil thie per ! 7%, anc the 
i! a ypene eiy Olea jig ] bres : , r : ay pe a, , : 
prt, I‘ al RA ' iY im ithe fe rie ¢ hetwienn the 1 la rif trom The \y Mowing 
“i pre and the GOTloOnn 27 camino. VIOpOnION af correct answors : f the 
i * } } 
tap * 
Vv 7 R, Rk 
N, N, (4.5) 
Or y—R,-R 


N 4 « baer duse fy oa N, 


ere V = Net D;K,, ane refer to the number Of the examinees LVI 

fs roup and the lower Broup respectively; and N, and NV. ner : rise poeritiea i 

ee he anal Hnelaweretolp respectively, ) are the number of examinees in the 

dugeyiee the value of V Is negative, the jre 
ad from the final form of the test. Whe 


n the values of 

e a as of V (Net Di « 
Te thought to be discriminating well between superior of 
they © : 


0 and 0.40 are considered to be discriminating poorl 


me be the index of very little discriminar , | 
sidered to ation or negligible disc | sl lesetrate 
con ose the number of correct responses in the upper gr ‘ ey ee 


| uae eee Oup Is 80 and the number of correct 
responses If the lower group ts 70 on an item; and the total number of examinees in each of these 
ip extreme groups is 100. Now the Net D may be determ 


m dij 
Splays Negative discrimination and thus, it is 


let lems are above 0.40, 
and inferior. Net D falling between 
y and those falling below 0.20 are 


Aw ined as follows: 
vaku_Ri_ 80 70 80-70 10 
N, N, 100 100 100 loo 7? 


So __| 


Or, V Ru-R, 80-70 10 _ 
N. 100 = 100 


0.10 


The said item has the Net D of 0.10, which indicates that the item has negligible discriminatory 
power. Such items are ordinarily dropped or suitably modified, 


Correlational Techniques 


Correlational techniques have also been frequently employed as the measures of the index of 
item discrimination. In such situations each item is correlated against the internal criterion of the 
total score, that is, each item is validated against the internal criterion of the total score. This is 
called item-total correlation. Lindquist (1951) calls this the “internal-consistency item 
discrimination index”. When the correlation between the total score and the individual item 
score is computed as a measure of the discriminative power of the item, it shows how well the 
‘tem is measuring that function which the test itself is measuring. The validity index exhibits the 
extent to Which a particular item discriminates among examinees who differ sharply in the 
function measured by the test as a whole. As a matter of fact, item-total score correlation is 
regarded by most of the experts as the best index ot discrimination. In selecting items on the basis 
of item-total correlations, it is better if at least 75% of correlation is positive and preferably above 
0.15. Five common measures of correlation such as Product Moment (or PM) correlation, biserial 
r, point biserial r, tetrachoric r and phi-coefficient are frequently applied in determining the index 
of discrimination of an item. When multipoint items are employed, the ploduct miorient 
correlation is the most appropriate one. When items are In two-alternative responses ee 
reducible to two-alternative responses, that is, when dichotomous items are ~~ — pas 
or point biserial r is the most preferred statistic. When the total score is also dichotomiz } 
response items, the tetrachoric r or the phi-coetticients are used. 





Pity Mevistinemonts and Resear ds Methods in Hebariannal SOmeHees 


The various correlation techniques deseribed above Involve at Rood deal Ol Wa 
Outine work. Flanagan was the firs person to provide a short CULO these correlation « Gein, 
ee clally the biserial r, and thus, avoid the unnecessary fouline work Involved jp ig, Mey 
me Propared an aba table” from which one can directly roger the aan “ biserial ffl" 
a one knows the Proportion of examinens PAssins ai 27% ¢ aw 
PP aNd the proportion of the eXAMMIECeS Passi I ow  BEOUR yO) 


my 
OXIME Urey, ) ' Lthe total numbe " 
| reme group N should Hot be fess than 100, which means Hatt er af OXany th 
Should be 10 


Table 4.1 Hem analysis worksheet 






a , 5 
ne i 100 Droportion N 100 proportion tg : | Ihdexg 
ICE IN Upper27% cortect in lower 27% ——_ difficulty SCtiOinatog 
| 0.80 . 0.20 0.50 (1.60 
‘ 0.65 0.45 0.50 0.39 
0.90 0.40 0.05 0.59 
4 0.40 0.60 50) “0,208 
3 0.65 0.85 0.75 ~0,25 
2 0,55 0,25 0.40 0.30 
7 0.50 0.50 0.50 0.00¢ 
8 0.70 0.70 0.70 0,00" 
9 0.75 0.05 0.40 0.70 
10 0.95 


0.90 






0.92 0.05* 
* Items either to be roy | 
The data in Table 4.1 Flyer: : i* : . 
fin table 4.1 (llustrates the compute biserial rwith the Flanagan abac t; 
the basis afte oon s the computation of biserial rwit agan abac table on 


} /o and the bottom 27% of the examinees (separated on the basis of th 
score distribut lon), " - 
In Table 4.1, the item 


Correct proportion of those 
Correct proportion of those 


ped or to be modified 


analysis of only 10 items of a test has been shown, In column 2, the 
passing the items in the upper 27% of the group and in column 3, the 
(Thus the middle 46% ot rae or esl a the sia 27% of Ihe group have bee 
of each item has aah shown, The se eiharae weed} in column j | 

, ae as Deen found by averaging the pr oportions of columns 


2 and 3. Item 1 has 0.80 and 0.20 as Correct proportion in the upper and the lower vroy, 
respectively. The average of these proportions is equal to (080 + 0.20) /2='%=0 50.1 1 sine 
the difficulty value has been determined. The difficulty value in column. 4 : fo t 4 wie 
examinees calculated on the basis of 54% (top 27% plus bottom 27%) of the grou 1 xg 
discrimination of each item appears in column 5 which exhibits biserial rread dir y ‘ i : 
Flanagan abac table. In the graph of this abac table, the correct Proportions of he hi: 
are shown on the ordinate and the correct Proportions of the lower group on hea the 
; ’ scissa. The 

raph, The Point of intersection is 


n shown, 
4, the index of difficulty 


lines showing these two proportions intersect at a point in the 


* For abac table prepared by Flanagan tar computing the biserial rcoefiic; 
Methods by | P Guilford, 1954, pp. 428-31. There, they can also find op 
coefficient, tetrachoric ¢ and phi-coeticients. 





ip 
ily lil In, | 


Hien Atvetlyais OF 


) cowlliciont Ob Diseral eet Cuillarel O54, Aen. Ut 


sal i" 
helt unig, Equation 4. 


ili | ated 
pom ; al wiotls (rc ihe shove warkwhient ‘il Wer rvalyain that Wenn, Ft nel t Haver Te ek 
" tiny CSC HIMInALOn and Horney’ 


: 7 and & have zero cCOociHiChonts indicating: ne 
joel if | tian helween SUPT EONS eure INlOrhOrs, Newry 1) AL aly a negligible Fi corre katie 
pierre : 


fine 1, Such Hens are usually not selected for the final inclusion inthe lost, emia |, 2, 4, 0 


er ; | la luavel Al i ive j H P i ‘ 
ae ve positive biserial ecoellicient indicating that the items are disc Himinating. When all the 
ap | , ’ 
" Hem analysis worksheet as shown in 


ae eal correlation « oellicients have been obtained in the 
teen they are ranked from highest to lowest, In general, the correlation coefficient of 0.20 oF 
jable 4 rarded as satisfactory and therefore, those items which have a validity Index of 0.20 
» are selected tor final inclusion in the test, Atthis juncture, Nunnally (1970) has made 
ker supgestion reyarding the selection of items. Acc Orcing to him, the selection of items 
rhe directly dependent upon the level Of reliability of the test desired. If the lest constructor 
oa have the reliability Coefficient of the test as high as 0.80 and he has dichotomous items 
“e 4 sample of the first 30 items having the highest correlation (after they have been ranked 
on hant ‘hes! to lowest) may be selected. Subsequently the Kuder-Richardson (KR) formula 20 
(om WB ae has been discussed in Chapter 5), on the basis of these 40 items, would be applied to 
this Or whether or not the test has reached the desired reliability coefficient, I KR 20 yields a 
oy ent of 0,80, no other items are included in the test and the process of item analysis is 
sie if KR 20 yields a coetticient below 0.80, a few items having the next highest correlations 
ine tal score should be added, and then KR 20 for the original 30 items plus the number of 
Ogee may be applied to compute the desired level of reliability. 
saad selecting bipolar items (which are common among the measures of personality, 
values and interests) On the basis of item-total correlations, sometimes interactive procedure 
becomes essential. Ordinarily, when at least 75% of the items yield positive item-total 
correlations, the test Is said to be an ideal one. It is not uncommon that the remaining 25% items 
yield either zero oF signiticant negative correlations, Interactive procedure is applied to those 
items yielding significant negative correlations so that they can now yield significant positive 
izem-total correlations. In order to do this, scoring for all such items having negative correlations 
< reversed. For example, suppose a particular item of an extroversion test had been previously 
scored as +1 for “Yes” and had yielded negative correlation, now it would be scored as +1 for 
“No”, and vice versa for items having the same negative correlations which previously had a 
score of -1 for “No”. In this way, the scoring of all items yielding negative significant correlations 
with total score would be reversed, However, the scoring of items with positive correlation or 
nonsignificant negative jtem-total correlation would not be changed. Now, the entire test would 
again be administered to obtain a new set of total scores and for each item, new item-total 
correlations would be computed. It is expected this time that the number of positive item-total 
correlations will increase. Ii many more items having negative item- 
the procedure can be repeated, 


ese COmehHOns Can alsa bite 


jor unvelic 


total correlations still remain, 


A SIMPLIFIED ITEM ANALYSIS FORM 


ltem analysis may be done in still 


more simplified and convenient ways without using the above 
table. Suppose the investigator is 


constructing an intelligence test in which there are 50 items and 


each item has four options. For item analysis, the simplitied procedure to follow would consist of 


the following steps. 


1. Each item will be scored according to the scoring key which provides the correct answer 
and the correct answer will be valued as 1, 2 and so on, or by a weight decided by 


the investigator. 





Os 






6. 


. The difference found in column 3 (which is 60 in case of item No. 1) is divided b 


> ae 


Teen. Moen moment rs. cra Aosennoh Wephewds gr Aiehant PLE qe ieee 
a tatal Store ff, 
hs jed and i, 
2. The score Riven tor each correct answet will be adde : 


ath 
examinee on that test will be determined. 


cores, 

; te aw total se 

3. The scored Amswer sheet will he arranged wom high tol ene anil thie lower 270, 

a ze fo 
« Kamowing the suggestion of Kelley (1939), ee sfp intact. The total Ny woul nf 

examinees will be selected and the middle 46% will wi vik lower 27% Will also ti be 
#70 so that the upper 27% will consist of TOO cases ane Sigy 
Of 100 cases, 

, esponded to each optig 


. 7 who rt 3 N (5 
» The number of examinees in the upper 27% BrouP sped In column 1 of Table 4 2 
ems No. 1 and 2, for example} is counted od alts wer 27'/ group who respond, 
similarly, in column 2 the number of examinees in the [Owe Gel 


to each Option is entered, 


eee i rm 
Table 4.2 A simplified item analysis f 
cra 3 4 











T 














P 
Difference DorV ) ding Diff i 
-caeimination| fesponeane cul 
| ede from middle Value 


210 ).56 or 50 


* Correct alternative 


The number of examinees in the lower group who selected the correct alternative js 
subtracted from the number of examinees selecting the correct alternative in the upper 
group, and is entered in column 3. In Table 4.2, the difference is 80 — 20 = 60. This is for 
item No. 1. The same value for item No. 2 is -20 where the number of examples 


selecting correct alternative in lower group exceeds the number of examinees selecting 
the correct alternative in upper group, 


y the 


number of examinees in upper 27% group (or lower 27% group) which is 100 here. Thus 


60 
100 : 
index, abbreviated a5 D or V. For item No. 


= 60 which ts entered in colu alue 
mn 4. The value of column 4 is called discrimination 
2, the same value is — 20, 

The number of examinees who responded correctly ; ; 
wares he actly in the | | 
which is 110 ¥ case of item No, 1) is counted seriecdge a pee 
group and the a BfOup €ach contains half the group, th eorumn 9. if the upper 
middle group and column 5 will be omitted, The Ip, there will, of Course, be no 


same Value for item No, 2 is 130 


of the distractors using the form 





jiem Analysis G9 


hoer col cocaminees whe Pespearie here) Crores thy Wm the tif peer, lorwer ane mri Ielle 
= yer 
phe nt 


Q . added. The sum of these three values represents the total number of examinees 
ae Ge 7 : : ; a = at 
group ‘i yored the items correc tly. The sum is 804 204 110 =210 which ts ente red in 
nave 


who « 6 for item No. 1. The same value for item No, 2? 15 240. 
rie se : et er 7 
cow Jue entered in column 6 which is 210 in case of itern No. 1, is divided by N, the 
= 4 all L Ss i jo z * > = \ r 
The mber af examinees taking the test. This is the proportion of the examinees in the 
qu : 


oup Who responded correctly (called difficulty value and often abbreviated as P ). 
gr 


p here is a = .56 (by Equation 4.1) and if we apply Equation 4.2, P becomes 
3 


10. 


total 


Thus, 
30+ 20_ — 100 _ 9.50. The same value for item No. 2 is .62. 
joo + 100 200 


h the task of item analysis looks complex at first glance, it can be simplified and 
althous 


3 eaient by following the above steps and by keeping records of the data wisely. 
“i convel 2 


-1WENES5S OF DISTRACTORS OR FOILS (DISTRACTOR ANALYSIS) 
‘T me especially in multiple-choice items, aims at determining the effectiveness of the 
. ‘i dices of item difficulty and discriminative power simply indicate whether an item 1s 
distract? a ended job and, not why or why not. If a particular item is not functioning well or is 
doing 't ae defective, we have to examine its possible reasons and one obvious method ts to 
sae “istractability of the incorrect options. This procedure is called distractor analysis. 
examine a ciueioes of examinees answering each distractor of an itern in both the upper group 
Usually, © a group is counted. Ordinarily, any distractor to be called a good distractor must be 
and the He more examinees of the lower group. If this happens, a distractor is retained in the 
nN as presented in the item. But if a distractor is answered by more examinees of the higher 
ye ft n the examinees of the lower group, the distractor is regarded to be of poor form, and 
roup saonetr ctor should rewrite or modify this distractor to make it appear more attractive to 
cit earrilliGs of the lower group. A test constructor may often find some nonfunctional 
oo which, by definition, are those that contribute nothing to the test. Such distractors are 
ae by none of the examinees of either the upper group or lower group. Test constructors 
often drop such distractors from the item or carefully modify them to suit their purpose. There are 
also distractors which are equally attractive to both the upper and lower groups. Such distractors 
also need suitable modifications. Any unsuitable or poor distractor tends to lower the difficulty 
value of an item and, therefore, it must be carefully dealt with. 


jem an 


Analysis of distractors does not require application of any complex formula. Examination of 
the alternatives selected by the upper group and the lower group in the item analysis makes it 
obvious how the distractors worked. To be more sure and exact, we can compute the 
discriminating power or discriminating index of each distractor as it has been done in Table 4.2. 
On the basis of the discrimination indices of the distractors, that is A, C and D, it can be said that 
distractor C is working in the right direction because it attracted more examinees in the lower 
group. Distractors A and D are not working at all because A was chosen by none of the upper and 
the lower groups, and distractor D is chosen equally by both the groups. From these sets of 
information, we come to the conclusion that items and/or distractors need rewording. From this 
example, it is also obvious that the correct response should have positive discrimination index 
whereas distractors should have negative discrimination index. 


m. If we don’t have the extreme group method at hand, and working on the basis of the 
ponses of all the examinees, we can compute the number of people expected to choose each 


ula given on the next page. 


@ a, Wary 
: TMAH yay 
WS area ps 
| Reiner, ih Meth rraegt Hh Hed 
TO ET Tr Koders 


Number, of 


PO rsone. PXpected 
lo ¢ hoose 


each ' a j 
1 distracte, = Number Of persons answering itenys in ore iy 


(4 
a Number of distractors ‘hj 
C are >? 


Correc » 22, for es es Who answer toa particular papi mak Sate stk 
FECT O means th. 2 10 responded to A. 30 to B, 45teC and fe r A et the 
Quation 4 «) 5 ia fee nees answered the item ineorrechy. pe exaiiliee Soh 
: examinees z nines to Choose each distractor. In # ‘ike the num Chas, 
as, Choosing 3 dict 2 tea ‘- than expected, and tewer CHOSE ah ted, there ha ber of 
“Ported two Possibilities ‘ractor significant ly exceeds the number expee artial knowl 
“ECON possibi| “S- First. it is just Possible that the choice may reflect P Edge, 4 


KY 1S that the item: test item, we Might expe 
FeSPONnses OT the Sarin Item 4 Poorly constructed. Ina perfect test ite Pecy th 


‘yA ; among the : 
Ot distractor. S answering INcOrres ily to be divided equally among TUMbe, 
SPEED TESTs 
Some Standardj 


ize . ; Bea af the examinee. They Provide 
ve ar time within which ses ii ay a seep > anal the last item in the test. In Othe, 
ten in Place Primary IMportance upon she speed with which an Su. orien ieee the 
aa = Y- 10 4 speed test all items should be of uniform degree of difficulty. TM) atalysie 
ot such sPeed tests Provides a special problem to the test constructor. The index of difficulty and 
the index of discrimination cannot he determined in exactly the same Way 45 is done in the Case 
Opener test, As such, a separate discussion is needed. 


Index of Difficulty 


intrinsic diffic ulty but because of their position in the test. Hence, the usual way a determine the 
index of difficulty by dividing the number of correct answers by the number of ihe ~The 
who take the test, would be F| misleading index. Hence, in the speed test the index of difficulty of 
2#n item is determined by the number of correct answers to an item divided by the number of 


examinees who answer or reach that item (and not the total number of examinees). The formula 
tor the index of difficulty becomes: 


_ (4.7) 
i NM 

where p is the index of difficulty; R is the number of examinees who gave the correct answers: 

and NW is the number of examinees who actually reached or answered (right or wrong) that item, 


Suppose thal a test consists of 50 items, Of these, the first 45 items were answered by all the 
examinees (N = 100) but the remaining tive items were answered by only a few, Further, suppose 
that item 48 was answered by only 60 examinees out of which 20 answered it correctly, Likewise, 
item 50 was answered by only 20 “xeminees out of which only 5 examinees answered it 
correctly. Now, the index ot difficulty tor item 48 would be = “= 033 (and not 20/1 00=0.20) 
and the index of difficulty of iter 50 would be 5/20 = 0.25 (and not 5/100 = 0.05). 

Like in a power test, the index of diffic ulty of an item is also Corrected tor ¢ 
guessing. The formula for the corrected index of difficulty of an iter in a speed t 
it (MWK = 4) 

N— HR 
here p, refers to the corrected proportion of 
4 a ct answers; Wrefers to the number oj 
correct an: 


hance success Or 
est is as under: 

Pp, 
(4.8) 
the index of diffie 


ulty; R refers 
INCOME answers: Kr ) 


to the number of 
efers to the 


number of response 





. .e 


fferm Ariadl pal Tt 


hem; Norefers to the number ef tetal ex aernimee- arc FIR peters 
, the : 


fey Chore meaervboert ral 
i evil reat rece be the hen within hie inne lerrvit 
i 


bie ws " 


hie 
nee - 


ato gLapppose ark arithrve tis Wl test fil Viwy rrevesltigebe Choawe fers lear hh haviny, bee 
To jaistre ts loa proup of 400 examinees wher were ree i 
pons! sa ter trem BA was corres ily aAnovwered bry FOO) exarnunes iIncorreetiy anwawered by 
opt af s0) Ayreon id 40 examinees clicl not reach the item anc) here es teavhe: 
pre e eow; its corrected iicle ef cliffie halts wrevuiled bie ecjial te 
300 -[60/15-—V)1 3on-15) 285 


= O77 91 7) 
Py 400 — 40) Le) bef) 


red te anvwer therm within a terre 


dtr attempt movvithin the 





orrected index of difficulty for the sarne item by Equation 4.6 would be equal to 
The Oe 33. Thus, guessing tends to inflate the ciffic ulty value of the tem in away similar to 
7 / 30 salir test. 
F Discrimination a 
index 9 ( discrimination in a speed test cannot be determined through the correlation 
The ince Tuetividual item score. Why! The answer is that ina speed test, the tems are arranged 
perwee? : 5 order and not trom easier to difficult order as in most power tests. Besides, the time 
nan ae tak does not allow all examinees to reach the last itern in the test Consequently, | 
in the “eae ‘athe beginning of the test will have a higher difficulty value whereas items placed 
jes places and of the test will have a lower difficulty value. Both these types of extreme items 
towards ue elate well with other items of the test and hence, would also not correlate with the 
would ‘ink thera placed in the middle of the test, on the other hand, will tend to correlate with 
potal score. ad consequently, with the total scores. In a speed test there 1s no specification 
other ie type of items to be placed at the beginning, middle or at the end of the test. 
regarding ee tencl to appear in an order which is arbitrary as well as random. When the order 
Genera’ Y. earvelations of items with total scores also become arbitrary and there is no sense in 
ye ae ‘oars on the basis of the item-total correlation. 
ate Cee test, the reliability of the test is dependent upon the number of items in the test. But 
F aycarue in case of speed tests where the reliability 
eh sitawell for the completion of items. Therefore, in speed test items are selected according to 
Re principles that will establish a reasonable time limit, which may yield a most reliable 
distribution of total scores. Nunnally (1970) has supgested a very useful method for establishing 
the time limit of a speed test. Suppose, the test constructor is preparing a test af numerical ability 
in which items tend to measure very elementary Principles of addition and subtraction. Suppose 
the test constructor feels that five such items may be answered in one minute. Then, he may allot 
a total time of 20 minutes for 100 items. But this is simply a conjecture on the part of the test 
constructor. In order to establish the ideal time limit, he is required to do: a little more. He may 
randomly select six proups of examinees having equal size, equal age, and equal intelligence. He 
may then give different time limits to each Broup for answering the same test. Thus, he may give 
15, 22, 18, 25, 30 and 10 minutes to six Broups respectively. The most ideal time limit would be 
one which will produce the most reliable distribution of scores. The reliability may be 
determined either by a suitable method of estimating the reliability coefficient or by the standard 
deviation of the total scores. It is convenient to get the estimate of reliability of such tests through 
the Computation of standard deviation. The larger the standard deviation, the higher is the 
reliability of the test. Six time limits would yield six different standard deviations. The largest of 
these values of standard deviation will yield the highest reliable distributiors of scores and 
consequently, that time limit will be the ideal time limit for the test. Suppose that the time limit of 
25 minutes produced the largest standard deviation. Then, the ideal time Limit of the sai 


is dependent upon the reasonable time 


i a fi 
curs Noes 
Lr SHPO Men fe 
= * CEPR, Atoc 
tel Search Methods tn Beb 
<a "OatiOounal Sciences 


I i ¥ mm 


fest) cones Z i 
t 
f 


Thus in a speed te 


basis of j | 
SIS Ol index of diffi tal correlation byt anit 


SL items are «al 
» are selected not on the basis of item-t0 he 
determined forthe... 
les 


culty as w y 
= e i | = 
ll as Upon ideal time limit experimentall 


FACTORS IN 
INFLUEN 
DISCRIMINATION N@ THE INDEX OF DIFFICULTY AND THE INDEX OF 


The diffic 
ficult i , 
and/or Y of an item js Influenced bh 
bh ambipuous and this m -ed by several factors. The ste™ 
previous experience ofthe ay lower the index of difficulty of the 
arenes seme Ne examinee j : ‘ index © 
Is NOt familiar with the t Inee is also likely to affect the nec” iy on the test. This w, 
lower the index of sai and content of items, he is likely to score P influences the line Will 
difficulty, When ; culty. The nature of response alternative? a oe mineé saitea nee 
position to choose ne mative responses are al hom Tah ce items. When altetiisess 
| ° s€ (he Correct answer, This is j ltiple-cno! Ve 
re ; swer, This is true with multiple™ 
eSponses are relatively less homogeneous, the examinee may easily hit the si ies 
: A ' : Cc ; 
ses bisa of discrimination is also influenced by several cea which infiuens ie 
oT ination js intimate] . sere _ many factors ee e p 
| a“ | ‘ly related to the index of difficulty, 2 ious exper; 
index of difficulty also influence the index of discrimination. For example aa of tens 
or learning experiences. e> peer and the hamogenelly OF alternative 
3 & experiences, extent of vagueness in an item ae by lowering it. Other facto 
responses have a direct influence upon the index of discrimination WY nines, ability of the 
influenci ng the index of discrimination are heterogeneity of the gn a appealing to those who 
stem to frame a clear question, and the effectiveness of the distractors "' ©r , 0 


do not know the correct answer. 


PROBLEMS OF ITEM ANALYSIS 


Despite the fact that test technicians hav 
test constructor may not face any difficulty or problem, 
These problems are enumerated below. 

1. Problem of spurious correlation in item- 
as a criterion for validating the item, that is, when ae 

-core, the obtained correlation coefficients are spurious inlie 
; ic ‘ease of a homogeneous test where all items measure more oF ° Scan | . nares UE 
happens because in each total score, almost all the items make their con i ea is ie 5, 
each item is a part of the total score, which Is used as the criterion In a ermining | m-total 
correlation. Two general conditions may be specified for minimizing the koe of Spurs 
Ore ation, The number of items should preferably be large in the test, and, as the items having 
SS ‘ndex of difficulty have more of a spurious element than items of extreme indexes of 
difficulty, such extreme iterns should also be included in the test. 

3. Problem relating (© dichotomous items: Dichotomous items (or bipolar items) pose some 
intriguing problems in ttem analysis. Items with True-False, Yes-No, Agree-Disagree response 
options are common among nonability tests such as personality test, interest test, attitude scale, 
anxiety test, etc. Ability tests, however, do not emphasize the bipolar items and mostly 
concentrate upon the multiple-choice items. The problem in dichotomous items may be 
illustrated through an example. Suppose the test constructor has written 100 items for an anxiety 
test, each with Agree-Disagree response options. Fifty items are positive statements and 50 items 
are negative statements. The test constructor prepares the scoring key in a way that all ts 
are scored as +1 and all disagreements are scored as zero. If such is the ca ; bas 
of each item with other items will be very close to zero and th “heigl acumen. corre! atin 

| neretore, all item-total correlations 


ill also be very close te be 
wi fy Close to zero. Moreover, all positive statements would correlate negatively with 


of the item may be com | 
em. Learming experienc, 
f difficulty. lf the examina. 


methods of item analysis so that a 


evised numerous ee 
ee basic problems remain unsolved. 


some 
Whenever the total score is used 
ever an individual item 's correlated with the 
ly high. The problem is more serious 


total correlation: 


0.5 as | 





. 


ftem Analysis 73 


We statements. Ina situation where the item-total correlation for all items appear e 
the noe at zero oF becomes negative, items cannot be selected on the basis of the item-tota 
blem of item analysis has one solution. The solution lies in the test constructor’s 

e the scoring key in such a way that it may yield most of the itern-total 
in the above example instead of scoring all agreements as 41 and all 
reed and to 


to all positive statements when ag 
be positive, 


her. 
this pr 
pility = uve 
d jons 
é polation Ps zero, a score of +1 may be given 
disaal® = :, statements when disagreed. Then, most of the itern-total correlations will 
ail nega an ie selected on the basis of the correlation of an item with the total score. 
item : # | 
ia blem associated with control of unwanted factor. ideally, in homogeneous tests, all 
3. ProRis y, items should 


uld measure only one attribute and not any other attribute. According| 
. th each other and not with any other factor. But usually it happens that Items also tend 


ate wi , ens nd 
correla with a factor or factors, which are not wanted. For example, the items ora numerical 
he factor of verbal 


to correlate measure quantitative aptitude but also measure t , 
aptly ‘on because items also involve the comprehension of words and sentences used in 
co uch a situation items, besides the factor of quantitative aptitude, would also tend to 
it a ith the unwanted factor of verbal comprehension. In an item-analysis situation, 
a ebch ‘tems are to be selected on the basis of the item-total correlation, the effect of 
especial! Golled and unwanted factors poses a problem—how to purify the items from such 
suc or tunately, there is a Way out. The best procedure for the purpose of minimizing the 
re times the actual number of items to be retained in the final form. Thus, if 
s correlated 


o start with four 3 
test is to be -onstructed, about 160 items are to be written, Each item then | | 
4] score as well as with the unwanted factor. The purified items would be one which 
positive correlation with the total score and low correlation with unwanted factor. 
g low correlations with the total score and high correlation with unwanted factors 
Subsequently, the Kuder-Richardson formula 20 may be applied to estimate the 
ined reliability of the test. If the reliability coefficient is higher than correlation of the total 
aan yith scores on the unwanted factor (which is computed separately), it is said that the effect 
she unwanted factor in item selection on the basis of the item-total correlation has beer 


minimized. 
4. Problem associated with guessing or chance 


effect I! 
4 40-item 
yield higher 
iterns yieldin 


are dropped. 


success: As discussed earlier, the problem ¢ 


@ success is acute among the items with two-alternative options because he! 


chances of answering correctly on the basis of a guess. In th 


ssing OF chanc 
gh not fully eliminated. T| 


gue 3 
an examinee has 50" 
multiple-choice items the problem of guessing is minimized, thou limi 
affect of guessing is to elevate the score and thus, increase the index of difficulty a 
discriminative index. Test technicians, therefore, have argued for introducing penalty for ea 
form of correction. Generally, the penalty is introduced by subtracting the numbe' 
ses (attributable to blind guessing) from the number of correct responses. ( 
been discussed earlier). There are some test technicians who opp 


on the ground that guessing does not necess: 


d by the item. According to them, in many Cé 
but for answe 


guessing In 
wrong respon 
formulas of correction have 
the idea of introducing correction for guessing 
reveal the lack of real knowledge being measure 
the formulas of correction for guessing do not make correction for guessing 


incorrect options due to misinformation or due to some other errors in constructi 
ime to every examin 


itself, They further assert that if the test Is sound, and gives sufficient t 
answer all items, and is appropriate for those who are taking it, no correction for guessing st 
be introduced (because the chance for guessing is automatically minimized). 


on ot the 


= fae 


oe ESC NCD Aa, , 
Methods in Behavionval Sciences 


Nalysis provide 
oe ides three Mes ah tio . ‘ 
On about item dis types of information: (a) informatic 


Crimination (c) information about distractor, tou te, 
related though conceptually distinct. it hese th 
ifficulty value affects discriminating power vant hs 


te & Power of item. Here we sh Li ey: 
eract with each othe an item all see how these Hi 
. fi 
Be 


Distractors and difficulty- An 


depending UPOn its dis Y multiple-choice item could be very difficult Or Very 


tractors, For example: 


A. The first psye ic 
Psychologica| laboratory was established by Wilhelm Wundt in the yea,. 


(a) 1649 
(b) 1779 
(c) 1879 
(d) 1959 
B. The first pPsycholo 
(a) 1865 
(b) 1869 
(c) 1889 
(d) 1879 
Even a person who kn i ’ : 
difficult testy fern A becanwe War BT iss ay pee 
ar ar = | las ridiculoy 
distractors. Obviously, the difficulty level of the item is being affected by the plausibility of; 
distractors. If the test-taker or the examinee knows nothing about the trait/domain bein ye 
any distractor might be equally plausible. But in reality, the test-taker knows the sdomsae being 
tested and therefore, is not befooled by the ridiculous distractors. The reality is that the roma 
has generally an impertect knowledge of the domain and therefore, may be easily befooled iF 
the plausible distractors. Item difficulty can be easily changed by rewriting or modifying the san 
distractors. ) 
Distractors and Discrimination: Distractors not only affect difficulty but also th 
discrimination value of an item. The presense of one or more implausible distractors tend f 
lower the difficulty of an item. Items which are too easy don’t have any potential for makin i 
discrimination. More or less, the same is true for extremely difficult items. In view of this ie 
writers should pay special attention to writing good distractors becaus iti | pike: 
3 dis denmeet aeaweree bo cause writing the stem pant of 
the item or __ NOt very difficult. Thus distractors affect the level of 
discriminating power of an item via making the item either very difficult or ver eas ; _ 
Difficulty and Discrimination: The level of difficulty value oer 
discriminating power Of an item. If the item is such that e 
answer (p = LO) or every €xaminee chooses an incorrect an 
cannot discriminate between those who are superior with re 


§!cal laboratory was established by Wilhelm Wundt in the year. 


places a direct limit on the 
very €xaminee chooses the correct 
swer (p =0), such an item response 
spect to the domain being measured 
6 Measured, When the difficulty value 


ae la lee mi | 2 . ‘i ‘a = 4 
be ge si tes saa established that items wile 7 = mess 7b aoe chao 
.70, U. = OSSe55 the max; : ue near 0.50 
he be kept in mind that ie ma areca for making pood dlecliminaiions, teen 
EER ae ar | ) , 3, PIOWEVET, 
tor. . oes ' é . 
discriminator. Extreme P value plac Nol guarantee that an item will be a good 


Ca direct “gs = 
item. On the other hand Statistical limit on the discrimie-s: 
Seed when roan d N the discrimin sean a 
discrimating ability of an , the p value js near 0.50, there are no «tat; ating power of an 
em. However, a Poor item with d Pp val f ete 
i dive of 0.50 


IS still a poor item, 





- 


fier Analysis 75 


Thus, We find that all these three characteristics of tems make sufficient interactions with 


her. 
each ot | | 
HE ITEM CHARACTERISTIC CURVE (ICC) AND ITEM RESPONSE THEORY 


ICC summarizes much of the: information conveyed by item analysis. It is a graphic 
: sentation of the probability of giving the correct answer to an iter as a function of the level 
he attribute assessed by the test. In fact ICC serves as a foundation of one of the most important 
of tne « the item response theory. This theory was specifically developed for the purpose of 

derstanding how individual differences in attributes of the examinees affect his behaviour 
Sven confronted with a specific item. 

— 


ICC is used to illustrate discrimination power and item difficulty. The steepness or slope of 

ICC conveys information about the discriminating power of the item. When item-total 
= an is positive, the slope of ICC is positive (curve A of Figure 4.1). When item-total 
on is near zero, the slope is near Zero, that is, flat (curve B of Figure 4.1). When itern-total 


ive, the slope of ICC is negative (curve C of Figure 4.1). Thus the relationship 


correlati 

arrelat! 
correlation" 
relation is negat 





nee een the slope of ICC and item-total correlation gives an index of an item’s discriminating 
wer. 
. 2 2 1.00 
a 3 A 
a 2 B 
3 3 c 
: 8 
Eso 
6 b= 
2 z 
= ra 
: : 
S 00 © .00 
o 20 21 22 23 £24 o 20 21 22 293 24 


Total test score Total test score 


Fig. 4.1 Three 1CCs showing discriminating power Fig, 4.2 Three ICCs showing variant difficulty values 


The position of the ICC curve gives indication about the difficulty of each item. For the 
difficult items the ICC starts to rise on the right-hand side of the plot (high total test scores) and for 
the easier items, the ICC curve starts to rise on the left-hand side of the plot (low total test scores). 
Figure 4.2 shows ICC for three items which show similar discriminating power but which vary in 
difficulty. Item C is the most difficult and item A is the most easy one. Item B is of 
moderate difficulty. 

ltem response theory (IRT), also known as /atent trait theory or item characteristics curve 
theory, is one of the modern approaches for explaining and analyzing the relationship between 
the characteristics of the examinee (ability) and responses to individual items. The theory states 
that the probability of a particular response to a test item is a joint function of one or more 
characteristics of the individual respondent and one or more characteristics of the test item. 
According to classical test theory, a score is derived from the sum of the individual’s responses to 
different items which are sampled from a larger domain representing a trait or ability. IRT tends to 
consider the chances of getting particular items right or wrong. This new approach makes 
“xlensive use of item analysis (Cascio 1987; Steinberg & Thissen 1995). According to IRT each 
tem on a test has an independent item characteristic curve that describes the probability of 
Betting each particular item right or wrong, given a certain ability lavel of the examinee. In this 
New approach, with the help of computers, items can be sampled and the specific range of items 
where the examinee begins to have difficulty can be identified. In this way, the testers can make 





Le 


‘oe Secs Vii Wit 

" ‘ Mey ONES Hn ehyay 4, eral Screncen 
an ability 
mi 'Y Judgement Without subjecting hi 
‘s NOs! MM portant Advantages Ob IR] oer 
RT is FONSI dered as the most ; 


Of the 20th century. 


Sxaminees to all the test ems. Perl 
Over the c| 


MDS thie j 
Mportant deve 


ory. That is the a 


assical test score the we 
| Mean 
al Testing in the ; 


lopment In psve holowic mn y 

wth ONd | 
4 

§PProaches 1 | a 

Ch as difficulty a . s to the Fenstruction of tests using IRT. Most © 

Probability that the ne al discemability, Some other approaches add a third di : 

Whatever may be the seit oe having a low level of ability will get a COFTECE Fe 

. . on 74 Te . : . ’ ® 2 
doing well aenes, they try to grade items in relation to the Probability thar 


OF poorly on th ill ! 
© test will have dit : a : Cth 
< | ; € different levels of performances. Fo : 
Ironsoan (1983) iter characteristic an erent levels of perto e Following Guion ,. 


which gives the es may be averaged to create a test Character 


: Proportion of mc... . Stic 
ie Tages Proportion of re sponses expected to be correct for each level of ability ” 
N tact, IRT makes 


@ Set Of assumptions ic lationehsc | 
On’s ahi ss ions about the mathematical relations 
Person's ability and the Pp ‘ t é atice nship 


There are 
dimensions Stu 
for the 


differey vt 


I then) u 


assumptions f I likelihood that the examinee will answer an item correctly : 

o> W158 fio at : tints ; . : 

eomniesaral orm the basis of item characteristic Curves regarding underlying relationshing 
MpINCAal Outcomes oj lesting (Murphy & Davidshofer 1988), “nd 


One of the major advantages of IRT is that 


;, it provides measures that are general 
MNVanant. In other words. P Y Sai 


| | it yields such measures for describing item characteristic Curves } 
don t depend on the sample from which test data are drawn. Thus the theory produces m 
of item difficulty, item disc rimation, etc., that are invariant across different samples of 
who take the test, This is not true tor the classical standard item analysis statistic, For example. th 
same spelling test which is difficult for class IV students may be easier for class X students, Thue 
the invariance of measures obtained using IRT is considered to be important since it allows 
characteristics of items to be analyzed without confounding them (as classical item analysis 
processes do) with the characteristics or abilities of the examinees, 

The most attractive advantage of tests based upon IRT is that such tests can be easily adapted 
to Computer administration. The computer can easily and rapidly identify the Specific test items 
required to assess a particular ability level of the examinee. Thus, with this approach, the 
examinees don’t have to sutter from embarrassment of attempting several items beyond their 
ability. Likewise, they are saved from wasting their time and efforts on items far below their 
abilities. In this approach, since every examinee gets a different set of items (which are apparently 
or likely to be matched with their abilities), the chance for cheating is greatly reduced. Weiss 
(1985) and Weiss and Yoes (1991) have readily demonstrated that computer-adaptive testing 
tends to increase efficiency by 50% or more by reducing the time required in responding to every 
item of the test. Additional advantages are that IRT can easily handle items that are written in 
different formats (Hayes 2000); it can identify those respondents who show unusual response 
pattern and it also reduces biases against people who are slow in completing the test iterns. 


IRT has been applied to solve many such problems which are ordinarily difficult to be solved 
by traditional approaches, Three such examples are: (a) getting information about ability from 
distractors (b) tailored testing (c) item analysis for sore specialized tests, These may be discussed 
as under. 

(a) Getting information about ability from distractors: In classical item analysis formal, an 
answer is either scored right or wrong. The right answer js given a score of 1 and the wrong 
answer is given a score of 0), It makes no difference which distractor (wrong answer) is chosen. 
Supporters Of IRT believe that some distractors are, in fact, better answers than others because 


is comes mes Partial knowledge whereas others will be chosen only if the examinee has 
ithe | &€. !hus on the basis of the responses given towards distractors, some inferences 


about the abilities of the examin | : 
ExdmMinee can be dr = IRT ktececte - e 
towards a test item provide awn. Thus IRT suggests that any response mad 


theory sites ei ae 5 information about the person's ability or knowledge. Therefore, the 
sp ilasts upon the construction of a separate ICC for each possible response to 4 


CaSUtes 


| 


jem Analysts ‘TT 


ch oe item, ih at is lo distrac ora a6 wre |i AS thie Ceorrect response } mure A 4 presents a Se 


jpole i eo lie 
pulp characteristic: Curves lor the pessible responses fo a sinighe multiol:-chaice test mem 


i egg three distractors #8, © and D and one correct fesponse, thatus, A. Chowe A shows positive 


havin iting, power of the item. 





discerns 
10 
ch 
3 5 A 
7 
2 64 = 
5 5 
2 4 * 
rs’ ® 2 
2.3 
a 9 
ial rs 
— 
e 4 24 
& _— 
0 
High 


Ability 


Fig.4.3 Item characteristic curve for each response 
of a multiple-choice item 


Choice C (one of the distractors) also shows positive discriminating power, choice B aon 

hav any discriminating power and theretore, is unpopular and choice D shows nega si 
Se ene inatirta powar. Thus Elgures4;2 abwlously ussests ‘han although chaice © posseaes tess 
pagers sation it is nearly as good an answer as the choice C. Both Band Dare unpopular 
scons 3 ne has poor ability, it is very much likely that he will choose B. Thus rather than 
ee h ' em simply either right or wrong (for all distractors), an examinee might be given 
scoring | id Soe f ing Cand might receive the lowest score for choosing B. In this way, a 
nearly full credit for choosing C oe cy . » ability of the examinees. 
careful analysis of distractors might give some information about the ability of the e : ine 

(b) Tailored testing: Another interesting application ot item response sos Pl : 7 
computerized tailored testing. Today it is being realized that rather than sa iar’ ne : ‘ * : 
that contains some items appropriate for the examinees having poor, moderate and Dig an * ty 
levels, it would be better to construct tests that were tailored to the ability level ot i 95 ne 
the test, Developments in microcomputer tech nology and in psychometric theory am = “s 
believe that for any individual, a test is most precise if the difficulty of items match with id ity 
level. Item response theory can be easily used to help tailor tests for the individual examinees. 
When an individual takes a test at a computer terminal, it is possible to estimate the examinee i 
ability at each step of testing and then to select the next test item which would match the aaah : 
estimated level of ability. Thus tailored testing provides a great deal of precision within a short tes 
and IRT helps a lot towards this. | 

(c) Item analysis for some specialized tests: IRT is helptul in conducting item apeseo Noe 
tests where normal guidelines like item-total correlations should be positive, item difficu oo 
clustered around .50, etc., fail to apply. Screening tests and criterion-keyed tests are the exampres 
of such specialized tests where these guidelines don’t apply. Screening tests are those that are 
used to make preliminary decisions or to make decisions as to whether or not the pan) hi 
exceed some minimum level of knowledge needed to qualify for the position. In criterion-keye 
lests, there are no right or wrong answers and an examinee’s test score Is determined by the sie 
0 which his responses become similar to the responses of some known group. The famous MMI 
isan example of criterion-keyed test. Researches have shown that for analyzing cot ab - 
criterion-key tests as well as screening tests, IRT is of special use. With the help of IRT in suc 





yinal Screrice 








now pee 
= hele = 
ie discrimination power and difficuy, | 
— seer tt 
~~ a h | ical ly 
=_— ” 7 ! opica : 
a ee in the field of psycho : vi ee ’ bi 
a 2 ue qe come up. For exaMmpre, SUTCUities aticas the 
> he aS ‘ =r a 1) W ~ 
ane : 
: ne WATE F 2 sian & SaccuzzZo 200 
% ~" ‘ al Mak wt! 
=] aoa =< 


v4 Review g 


5 piscuss the methods of calculating difficulty o> 
—— hs oer & 


_ = = 
~ wo = ite examples. Also, discus 
~ - sem analysis and cite S the f,, 
piereyt OF iE analys Act 
ee 
s tp jake vss- ; 4 
pense nation index? Discuss the major steps jin Caleulay 
=... =a “ys ‘ 
SR ee inv index of an ability test. " 
eee 3 SFiculty index © 
a ae A Suir , nk = - 


oe a =  yalue and validity index of an item. Point 
— eT a = 


Our the 
Sen, SaCraque in tem analysis. | 
————— a : fe : ‘ : 
ee .ns among various characteristics of an item. | 
= qs eaesactAee i ee pt 
= ’ _ wa Characteristics Curve? What is its significance in it 
i mw Ea SS ee : 
roy Fe . | 
_- aoriacsnon of ilem response theory for test development. 
a i Tar ae 


+ 








5 
RELIABILITY 


CHAPTER PREVIEW 


ry and Theory of Reliability 
Histo 


° eaning of Reliability - | _ 
a* Logical Meaning OF — Meaning of Reliabiliry 
ds Gor Types) of Reliability 
: MeN retest Reliability 
internal Consistency Reliability 
Alrernate-Forms Reliability 
scorer Reliability 
what 15 4 satisfactory Size for the Reliability Coefficient? 
; standard Error of Measurement 
: Reliability of Speed Test 


factors Influencing Reliability of Test Scores 
° Extrinsic Factors 
Intrinsic Factors 
» How to Improve Reliability of Test Scores? 
e Estimation of True Scores 
e Index of Reliability 
« Reliability of Difference Score 
e Reliability of Composite Score 
zB “apie saaapinins acacia sg 
HISTORY AND THEORY OF RELIABILITY 


As compared to physical sciences, in psychology, many things make the task of measurement 
very difficult. One important reason for this is that 


psychologists are rarely interested in simple 
qualities like height, weight, width, etc. In fact, they are interested in measuring complex 
things/traits like intelligence, beauty, aggressiveness, etc., which can neither be seen nor 
touched. Besides this, no rigid yardsticks are available to asses 


s such traits. In the absence of rigid 
yardsticks, the test developer must use “rubber yardsticks” 


which may sometimes be stretched to 
overestimate some measurements and shrunk to underestimate the same at o 


' . ther times (Nunnally 
Bernstein 1994), Psychologists thus have to assess their measuring instruments to determine 
low much ‘rubber’ lies in them. This ‘rubber’ conceptualizes the ‘errors’ involved in the 
measurement. 


ee The theory of measurement error has, thus, been the major concern while 
psychologists attempt to work on reliability. 


iene its debt to the British psychologist Charles Spearman who took pains to 

the: concen i spay : Te pitickttagalli rigeote hile In ue 34, De Moivre introduced 

Popular produc sftiornes Saleniedt Seda aia : rly nee mere 

COME ents lopether | nae ‘ Bes y ie a a Irnies to pout these two hasic 
Be Mer in the context of measurement. opearman’s work provided the much-needed 


loped the 


79 





ll Ce 
fasurement ake and his Work was published in 1904 in 
i — ““OClation between Two things”. Spearman, SRG, 
“asurement pic | mdike of the United o>. Mtr, lly 
,» he Published 3 he entin eet E L Thorndi : Nited States of Atti 
eee FSF ene, an EEN 0 the Thy of eye 
€ Very usefy| . I suc ' Concepts which even by contem or 
85 of the © oan ince 1904, many noted developments took I 
Rick sessment of reliability. Among these, the most im 
‘<Nardson 'N 1937 in which several new reliability ¢ flicig, “5 th 
reliability Coefficients, the coefficients obtained by Kud eS es 


2eCame ver if mad 
advancement / Tay Popular later on. Cronbach and his se he 
. ee eM (Cronbach 1989. ! 
identify; | 


Paaea a th, 
* Cronbach et al. 1972) by evolving Various | 1g 
Pe mec al. Meth | 
formulated and ta of errors. More recently the item response theory RT) hat I 
of psychological mane ake” advantage of computer technology for advancing the basic COnce, 
“NOlogica Measurement (McDonald 1999; Michell 1999). However, the basis of Rpm 

Fest, 


ana i 


Upon F ik. : 
PON many of the Ideas introduced by Spearman about 100 years ago. 


Sie ha : beg I test theory “ssumes that any observed score (X) is equal to the true SCOPE (Ty 
nes or “Core (E). This classical theory assumes that each person has a true score that woul ls 
ODtained if there aré No errors in Measurement, But in reality, the measuring instrument, : dj 

Perfect. Such “frors are also produced by the characteristics of the individuals or Situations ee 
Can affect the test scores but which have nothing to do with the attribute being Measured : 


difference between the true score and the score obtained results due to their 
Measurement, or error score, Or the s 


same can be said in this way that the difference bet 1 
score the Person obtains and 


WeEn | 
the score he is really interested in, equals the error Of Measuremen, 
X-T=E 


y of reliability further assumes that the errors of mMeasureme 
systematic, but random. Using a rubber-yardstick terminology, it can be said that the Classical teg 
theory uses such rubber yardsticks in which the ruler stretches and contracts at random, The true 
score is obtained by finding the mean of observations from repeated applications (Kaplan ¢ 
Saccuzzo 2001). When it is said that the measurement errors are essentially random, this does no 
mean that errors arise from random or mysterious processes, rather it means that th 
errors in measurement are so varied and complex that these measurement errors act like random 
variables. If the errors have the characteristics of random Variables then it is reasonable 

that the errors are equally likely 


to be positive or negative, and that the 
true scores or with errors of other tests. In other wor 


Mean error of measurement = 0 


The cla ssical theor 


nt dare not 


to assume 
¥ are not correlated with 
ds, it is assumed that: 


True scores and errors are uncorrelated = 0 
Errors on different measures are uncorrel 


Onithe. basis of these assumptions, a very extensive theory of test reliability has been 
developed (Gulliksen 1950; Lord & Novick 1 


968). Some important facts, too, have been derived 
from this theory that have obvious implications for m 
classical theory of reliability state 


~asurement. One such fact is that this 
f true scores plus th of obtained scores js simply the sum of 
eo ! S plus the aah nein : 
varianc Pras ing “surement. In terms of equation, 


ated =0 


Of =O} +0? 
, 2 ianc | 
where O° = variance or standard deviation squ 
scores vary as.a result of two factors: (a) 


ared, This formula, 
of measurement, |f much of: 


aan ik lel, in effect, suggests that the test 
variability In true scores, and (b) Variability due to errors 
Y Observed in test scores c 


an be accounted for in terms of 

scores w ae : 

ih — be NCONSistent (poor degree of reliability). On the other hand, 
ave little effect on test scores, the tes 


t scores will be consistent and 


al 


_ jncreases 


Reliability 81 


aflecting those consistent aspects of performance which are labelled as true scores, 
ey il wr . coefficient provides an index of the relative impact of true and error scores on the 
he iabilit ; 
' : relia ast SCOTES. . | | | - 
obtain ain sampling model! is another Important concept in classical 
The ar 2al with the problems created by using a limited number of ite 
ies tO 5 x ability/construct. SUppose a researcher wants to evaluate t 
come ak way would be to go systematically through a standard 
on i each word, and finally, determine the percentage of word 
a person to — unlikely that this would be done because NO researcher w 
Ae ree skulle Therefore, he would like to use a sample of words w 
such an oe apallttie ability. Had the researcher given al| 
evaluating the spelling ability, the correct percentage of 
evaluating’ In reliability analysis, the task of the researc 
grue score ‘a the score from the shorter test as an estima 


made in S iel conceptualizes reliability as the ratio of 
ling ™ 
samp! 


test theory. This 
ms to represent a 
he spelling ability 
dictionary, enable 
s correctly spelled. 
ould spend time for 
hich will be used for 
the words in the English language for 
words recalled would have been the 
her is to estimate how much errors are 
te of the true ability. In fact, the domain 
the variance of the observed score on a 
+ and the variance on the long-run true score. The measurement analyzed in this model 
shorter oo oduced by using a sample of items rather than the whole domain. As the researcher 
igthe error In mber of samples, it represents the domain more accurately. As a result, the 
“ie abet of items or samples, the higher the reliability. Reliability can be estimated 
greater ihe acne of the observed test score with the true score. Finding the true score is rarely 
from the core fore, it is assumed that when items are randomly drawn from a given domain, 
NE Zap val items should provide an unbiased estimate of the true score. 


zeneralizability theory is another alternative approach presented by Cronbach et al. 

1h suring and studying consistency of test scores. The most serious weakness in the 
(1972) wn rabilte theory is the concept of error measurement, which is assumed to be random 
——  Croabaek has pointed out that factors that determine the amount of measurement are 
Oe es internal consistency methods than for test-retest or alternate-forms methods. The 
olny coefficient is taken as a ratio of the true score to true score plus the error score, but if the 


ake-up of the true score and error score change when the estimation procedures change, 
make- ad 
something is definitely wrong. 


eac 


The generalizability theory identifies both systematic and random so 
that may contribute to the error score. Cronbach et al. (1 972) 
reliability theory which breaks down the obtained test 
components, is simply a special case of 
between these two theories is that the 
recognizes that error is not always random 
of inconsistency in measurement 
generalizability theory, the focus 
other plausible scores. The cen 
conditions over which one can 
expect either similar or differe 
Beneralizability theory, 
rather than simply askin 
reliable for some 
basically focuses 


urces of inconsistency 
have claimed that the classical 
score into the true score and random error 
the generalizability theory. The primary difference 
Reneralizability theory, unlike the classical theory, 
, rather there are some specific and systematic sources 
that must also be taken into consideration. In the 
is on our ability to generalize from one set of scores to a set of 
tral question in the generalizability theory is “What are the 
generalize?” or “Under what sorts of conditions would one 
nt results than the ones obtained in one situation?” The 
Prompts us to ask questions like “In what situation is the test reliable?” 
g whether or not a test is reliable. According to this theory, a test can bs 
Purposes but can be quite unreliable for other Purposes. Thus, this theor 
on what we wish to do with a test rather than on the test itself. Th. 
6€neralizability theory claims that if the scores differ systematically according to whe 


low the test wac taken, these differences will 


n, where c 
affect the generalizability of the test sc 
as the Meaning of the tact caliat.tes. 


OTfres 36 wre 





a Pa 


: sh to assess an individual’ 
indivi nice ual 
Sia han clinician. Since there is deichansion nn Using 
Psych nee b ree different psychologists test the person ne 
: ycnologist. The resulting scores from such and rep 
depicted in Table 5.1. (Scores on this hypothetic 


test r aie 
erg Mal m 
Cat the les} 40 Wire| 


» three tj 


l 
4 procedure might look ab, thy 


Wes, 
sOMething init 


al test range fr 
sii TOM 
Table 5.1 Test scores for three different 5 } to 100) 
three times. Psych 


ar 
By 
tho, 


Ologists, each of whom tests th, 
e indiy; 






dua) 










; P 
Time of Testing sychologist 


January ) 70 
- 80 
April 70 : | 





Average 









Now the question j 
: StION 1S: [s this relj eat fe pr ee : gs 
define reliability. Forexamoc fe reliable? Strictly speaking, the answer depends Upon h 
but B consistantly ne F e, both Psychologists A and C have fairly consistent scores TOW Wye 
Ts trom psychologists A and C. The three psychologicte Over I 
| | - ENE sycnoalog;: Me 
January and April but agreed less in July. ae BISS agreed best in 
Data prese , | 
we can oa bi Table 5.1 can be explained by using the classical reliability theory 
seen a Feo enles In test scores as random errors. However, it will be m aa hae that is, 
‘ ia © systematic differences between psychologists or between the various tim ae 
and this information can be used in describing circumstances under which one can tp 
Seneralize test scores. ANOVA, a statistical technique, can be applied here for measuring i 
systematic effects of variables (psychologists, times, etc.) on the consistency of meas ra 
This statistic would id imal he iability in test sc ete 
provide estimates of the variability in test scores 
an associated with systema 
ferences in scores assigned by psychologists and differences in scores obtained jn Janu 
April and July. Average test scores over the time periods of January, April and July were ee 
less similar. Therefore, variability due to different times of testing is negligible. On the other hatd 
there were consistent differences between psychologist B and psychologists A and C. As a result 
a larger portion of variability in scores is associated with differences between psychologists 
Lastly, a good deal of variability is not systematically associated with the psychologists or with the 
time of testing. 

A simple look at Table 5.1 suggests that scores can be more authentically generalized over 
time (average ons es at each time being more or less similar) than over psychologist 
(psychooe™= See terms of their average scores). Thus the question posed in the beginning, 
“Is this test re aaaar% would be answered differently depending upon whether the stability of 
scores over time or the stability of scores obtained by different psychologists is wanted 


MEANING OF RELIABILITY 


oma i : se sei characteristics of any test. In its simplest sense, reliability refers 
should yiald ee wine : = measurement or score. A well-made scientific instrument 
instrument should give Cones otn at present as well dS Over time. In other words, such an 
measurement which is ieflected in Petes Reliability refers to this consistency of scores Of 
held constant or somehow controll et. reproducibility of the scores. When all other factors afe 
highly similar) results for an exarni., a reliable test is one that produces identical (or at leas! 

n examinee from one occasion to the other. A test is said to be 


Kelabiliry 83 





tent over gush statis Sen all the examinees retain their same relative ranks of 

wo separate testing: - h i a ets A test is also said to be consistent (when administered 

t ce) if the examinees who ° atain high scores ON one 8ét’'ot Rams. ales ecore hieh Ga. an 

Mivalent set of ideals atid those who obtain a low SCOre On One set of terns also score \ won an 

equivalent set si sale rr ee amie of scores obtained upon testing and retesting is referred 

ta as the sails Seat a se! bin a a a SONSISIENCy of scores obtained from two equivalent 

-ets at items of a single az a single administration Is referred to as the ‘internal consistency’ 

‘ fthe test Scores. Thus, relia lity may be defined as the consistenc y of scores obtained from one 

a of measures to aneaner, ay ace, ta Anastasi & Urbina (1997, 85), reliability refers to “the 

consistency oft ee : ier galt Individuals when re-examined with test on different 

accasions, a Bie Bea nee te , equivalent items, or under other variable examining 
conditions.” It is. Of pine rom the E ve imerpretation that reliability is never the property of the 
test itself, rather, itis the poet ofa test when it is administered to the examinees, that is, it is the 
pro erty of the test sarap é correlation coefficient indicating temporal stability is known as 
the coefficient of stabi iy and the correlation coefficient indicating internal consistency is 
known as the coefficient of inter nal consistency or the alpha coefficient. Any statistical measure 
of reliability must indicate both the coefficient of stability as well as the alpha coefficient. For 
obtaining the coefficient of stability, the two sets of measurements or scores, found upon testing 
and retesting, are correlated with each other. Likewise, jor obtaining the alpha coefficient, the 
two sets of measurements or scores by two equivalent sets of items of the same test after its single 
administration, are correlated with each other. This is why, statistically, reliability is also defined 

as the self-correlation of the test. 

The meaning of reliability can be further clarified by noting the following important points: 

1. Reliability is concerned with the test scores or results obtained with an assessment 
instrument and not with the test or instrument itself. In other words, reliability is the 
property of the test scores and not of the test itself. Any assessment instrument may have 
a number of different reliabilities, depending upon the situation in which it is used or on 
the group involved. Therefore, it is more appropriate to speak of the reliability of test 
scores or of the assessment results than of the test or assessment itself. 


. Reliability is a necessary but not sufficient condition for validity. A test that produces 
totally inconsistent results cannot provide valid information about the test scores. On the 
other hand, highly consistent test scores may be assessing or measuring the wrong thing 
or may be used in inappropriate ways. Thus, a low reliability is indicative of the fact that 
a low degree of validity is present but high reliability does not ensure a high degree of 
validity. In brief, reliability simply provides the consistency that makes validity possible. 


cans 


. Any estimate of reliability is always concerned with a particular type of consistency. It 
means that test scores (or assessment results) are not reliable in general. They are reliable 
over different periods of time, over different raters, over different samples of task, and so 
on. It is just possible for the test scores to be consistent in one of these respects but not in 


another. 


Logical Meaning or Technical Meaning of Reliability 

The reliability of a test is also defined from another angle. Whenever we measure something 
either in the social sciences or physical sciences, the measurement involves some kind of error. If 
the measurement is perfectly accurate, that is, free from all kinds of errors which are attributable 
either to the imperfection in the instrument or to the personality factors of the examinees being 
lested, the reliability will be perfect and the reliability coefficient will be +1.00. But this is an 
ideal goal which is rarely achieved in either the social sciences or in the physical sciences. Each 
Measurement contains some error and therefore, reliability is never perfect. Thus, it can be said 


84 Tes 
Temens and Research Methods in; Behtrtoture| tec 
: : ct Jw ery ri | 


that each individ | 


f f f of n a Lor i os 
fF %, , as: 


where x, = the actual obtained score: x 


mess = true score: 
- “ore; a 
finity which represents the true score.) a 


= @fror *COre. ( + is th it 
| 7 sy 
irom errors | Ting " 
Ors occurring due to the chance fact 
Se as 
Well 


an of d large numbe i 
; | ae “ro SCOres Ma i 4 


ance Errors, or random errors, are also know, in 


Suppose we are weighing 10 girls who.aweach teyeas old, Nowoa uch 
some mechanical trouble, may someti en te. years old. Now our weighing machine @ 
take 100 Secaciicass tilvat i: oe zfpe ere give high readings or low readings of WEIBht. If y 
Thal (5, €ac 2 Wei 1€ 2 
vemititiesr ound lene vecsaie i gi ing ghed 10 times), chances are that the errors of hia 
thus leaving the true at ze cancel each other and the mean of these Errors would be zerg 
ivbuaiiad cs | ing of weight alone. It may also happen that the weighing machine: 
&E ; B sh readings each time (or low readings each time), a case illustrating the systematic e : 
In this case, errors will gO on accumulating on the positive side and the mean of such ae 
would not be zero. When such errors creep into the measurement, it becomes difficult 
estimate the true score. When we administer a test, many factors are likely to contribute to i. 
error score. Some of these factors are errors in scoring, errors in test administration, fluctuations 
the examinee’s motivational and emotional status, guessing, misunderstanding of instruction: 
misleading items, etc. The reliability is directly related to the size of the error score. The smaller 
the error score, the more reliable the test or the measuring instrument. 
Since any obtained score is divided into the true score and the error score, the total Variance 
of the test is also divided into components—the true variance and the error variance. Variance js 
defined as standard deviation squared or SD”. In terms of an equation, the situation may be 


written as: 


example 


=e +08 (5.2) 


where o; = total variance of the test score; a7, = variance of true score; and o% = variance of 
error score. 

Thus, the variance of the test score is equal to the variance of the true score plus the variance 
of the error score. Correlation between of and 07 is usually zero because the correlation 


between x,, and x, is assumed to be zero. In the modern test theory, the rel iability of a set of test 
scores is logically defined as the proportion of the true variance. In other words, reliability is that 
part of the total variance which is the true variance. The proportions of the true variance and the 
error Variance are found by dividing the true variance (o2,)and the error variance (o* ) respectively 


by the total variance (a). Thus the Proportion of true variance = o, fOr, and the proportion of 


the error variance = a2 /a;. Now, reliability coefficient (r, ) by logical definition becomes: 


ao 
——— (5.3) 





Reliahilirty 85 


ding 10 Equation 5.3, reliability is the proportion of true variance. The reliability 
cco’ e also be ¢ omputed ina substitute way where the coefficient becomes equal to one 


of the total variance which is the error variance. The substitute formula becomes: 
i =1-a / a5 (5.4) 

2 Hypothetical distributions of obtained, true and error score and their means, 

Table >- + ariances and standard deviations 


; “a 
ficient < 
ys that part 


coe! 










Students Obtained Score = Tue Score s+ Error Score 
A 20 22 -2 
B 14 ‘2 +2 
Cc 35 30 +5 
D 25 268 —3 
c 16 18 -2 
—————  SC”*”*~”:—“‘—sSCOCCOSC“‘“CSSSSSCNCN#NC(C(NSNSSSC NS 
Sums: 22 22 0 
means: 58.369 43.191 15.178 
Variances: 7.64 6.572 3.895 
SDs: 2 shows the hypothetical relationship between obtained, true and error scores. For 
Table 9- dent A obtained a score of 20, had a true score of 22 and therefore, had —2 points of 
example, se eee C had obtained a score of 35, a true score of 30 and therefore, had +5 


error Le this table (Table 5.2), the true variance is 43.191 and the obtained variance is 
points it atio 43.191/58.369 is 0.739 = 0.74, which is the reliability of these measurements. 
59.369. Eze 74% of obtained variance is attributable to true variance. lf the ratio was 1.00, 
This mere variance would be wholly attributable to the true variance. If the ratio were 0, there 
- eae relationship between the obtained scores and the corresponding true scores. 
would be n ae as True variance 43.191 

By Equation 5.3: Reliability = Ged variance 53369. 
Error variance | 15.178 -1-0260=074 


: . liability = 1— : ey: (ree ea 
By Equation 5.4: Reliability Obtained variance 58369 
Students should note that any test is neither perfectly reliable nor perfectly unreliable. Thus, 
reliability is not an absolute principle, rather, it is always a matter of degree. 





METHODS (OR TYPES) OF RELIABILITY 

There are four most common methods of estimating the reliability coefficient of test scores. These 

methods are: (i) Test-retest reliability, (ii) Internal consistency reliability, and (iii) Parallel-forms 

reliability, or Alternate-forms reliability, or Equivalent-forms reliability, or Comparable-forms 
reliability, and (iv) Scorer reliability. Test-retest reliability and parallel-forms reliability are the 
external consistency procedures because they compare findings from two independent processes 
of data collection with each other for verifying the reliability of the measure. A detailed 
discussion of each of these methods follows. 


Test-Retest Reliability 

In test-retest reliability, the single form of the test is administered twice on the same sample with a 
reasonable time gap. In this way, two administrations of the same test yield two independent sets 
of scores. The two sets, when correlated, give the value of the reliability coefficient. The 
reliability coefficient thus obtained is also known as the temporal stability coefficient and 
indicates to what extent the examinees retain their relative position as measured in terms of the 
lest score over a given period of time. A high test-retest reliability coefficient indicates that the 
€xaminee who obtains a low score on the first administration tends to score low on the second 


ee Pea a hy my 


administration, and its converse, when the OAMIMEE GCOPE™ Nigh OM thy 
tends to score high on the second administration. tn ¢ OMPULING retest ; we fj 
is often faced with the problem. ot determining a FEASONAbIE ting 
administrations ot the test. There have been some disagreements Te) iP betw 
< j 4 : nll by 
Linnie lapse between the two Administrations of the test, When the line hy ‘Pers Hay 
increase the reliability coefficient due to carryover and practice effect a Shor, iis Hh, 
other hand, is too long, it is likely to lower the reliability coeffi ient The te ime gi ; | 
4 ene 4 r OP Ap Ie os . Z on Reis =the «ory , "Ps Oh a 
convenient time gap be tween the two administrations Isa fortnight, whic} “PPro, the 
too short nor too long. There are e idences (0 support that thi ie wOMSidereg 
| nt. Viet 
Test-retest reliability has j . wees th 
gees ooh ity Has its disadvantages, Factors relating to the di. 
contribute lo the error Variance of test SCOres The lest-retest me thod F Sadvanta o ii 
of estimating the reliability coeffic tent. This method assumes that the Me suming hy 
eo Ke rib ae A . si : Pert LTV exami ' ; 
psy hological set up remains unchanged in both the testing situations But ees hysical si, 
sO. In fact, the examinee’s health, emotional condition motivational ¢ sal reality, this jg nd 
set-up do not re Main pertectly uniform. Not only this, the arent ret and his mens 
make-up also changes. Besides. some uncontrolled environmental cha Pp ysical and Meni 
during the administration of the test. All these factors are likely to fivtales ba ma plac, 
i ae rent trom the first administration and thus, the examinee's relative pes Hoe 
| 3 oor y ; Ae ye as pa =e é f i 3 5 OSitir ' ; 
o change, thereby lowering the reliability coefficient, Obviously, such factors : raid 


ae 

labitin, ony 
Mity, thy ty 
MI) | 


Spi 


= r i Sh r foe 7 _ . ontri | 
error variance and reduce the proportion of the true variance in total variance, In 4 "th 
source Of error variance in the lest-retest method is time sampling, Maturational a ell, the 

ji "ECTS also 


operate tn contributing to the error variance, When the examinees are oune chil 

time interval between the two administrations is a comparatively long ad such eee and th 
Obvious. Since maturational growth is not uniform for all young examinees, the ra i Ue 
produce a wider fluctuation in test score on the second administration, thus kar; Ikely tg 
reliability coetticient of the test. Besides, when the examinee is once acquainted ite the 
= and their mode of answer, he is likely to develop a skill which may help him in thes 
administration. He is also likely to memorize many answers given in the first adminidrae 
especially if the second administration follows a week after the first one. All the acquired sil 
knowledge and memory of the first answers are likely to help examinees in answering them igs 
more or less similar way the second lime, thus helping them in retaining their same relative 
position. Obviously, these factors contribute to the true variance and are also likely to inflate the 
reliability coefficient of the test score. Apart from this, tests that measure constantly changing 
characteristics are not appropriate for test-retest evaluation. Despite all these limitations, the 
lest-retest method is the most appropriate method of estimating reliability of both the sees Gl 
and the power test. For a heterogeneous test, too, the test-retest method is the most appropriate 
method of computing reliability, 


Internal Consistency Reliability 

Internal consistency reliability indicates the homogeneity of the test. If all the items of the test 
measure the same function or trait, the test is said to be a homogeneous one and its internal 
consistency reliability would be pretty high. The most common method of estimating internal 
consistency reliability is the split-half method in which the test is divided into two equal or nearly 
equal halves. The common way of splitting the test is the odd-even method Almost any split can 
be accepted except the first half and the second half of the items. A division of this sort is not 
preferred because the nature of items in the two halves in a power test is different. Usually, the 
easier items are placed at the beginning or in the first half of the test and the comparatively 
difficult items are placed towards the end of the test or in the second half of the test. However, the 
odd-even method can be reasonably applied for the purpose of splitting. In this method, all 
odd-numbered items (like 1. 3,5, 7,9, etc.), constitute one part of the test and all even-numbered 


Helucbality 8 


alike a 4,6, A, 10, 14, ele) eonetitute Arent Peg part ri ihe test. Far h eLaminee, thus, receives 
ers” aa: the number OF COnmen t answers an all Old -nuenberer| terre. Crise lennneenndiiniic antl 
i! at ip vif ‘ corre | AMSWeTS Fafy all ryopy Number ee errs r FAS ti tes, nwt heer ore fer the same 

i nue ”" In his Wiel ¥ from striptles ACMINIstration af the SIE ple Wwafth Pa | the list, het) §els cy oes 
id. Product moment (PM) correlation Computed to obtain the reliability of the half 
pasis of this half-test reliability, the feliability for the whole test i estimated, The 


the split-half tec Wnique is content sampling oF item sampling in which 





» otal 


ny the 
post error yarhare ein 


+ ot ; itt j 4 | 
yrs nthe fest lend to differ due to Particular nature of tems of due to differences in selection 
1 0 2 , 
core | ible 5.3 illustrates how al worksheet ior ene yep reliabnbity hay fe prepared, 
pyqems. te - ; 
I Table 5.3 Worksheet for the odd-even reliability 











Examinee Number of correct score Number af correct score 
on odd-numbered jterns on even-numbered iterns 
A 16 15 
B \4 13 
C 18 \2 
D 12 4 
E 19 18 
E 8 7 
/ 7 B 


Table 5.3, each examinee has two scores—one for the odd-numbered items and another 

: even-numbered items. Subsequently, PM correlation is computed between two sets ol 

for the This amounts to reliability of the half test. When the reliability coefficient of he half ests 

mi the Spearman-Brown prophecy formula is used for estimating the reliability of the whole 
known, the spea™ . 

test, The formula ts: 


ff 4 
i 1} 


in = es 


Ver, | 
Pa a 


(5.5) 


where r, is the reliability coefficient of the whole test; hy is the reliability coefficient of the half 
test Equation 5.5 may also be written as follows: | 
| 2 xreability of half test 
l+reliability of halt test | 
To illustrate, suppose the reliability coetticient of halt test( = ) is. 0.70. Then the reliability 
coefficient of the whole test (r, )by Equation 5.5 becomes: 
ae : 
py = RO _ 5.0829 =082 
"—1+070 170 


Reliability of the whole test = 


The advantage of the split-halt method is that all data necessary lor i os aa 
reliability coefficient are obtained in a single administration of the — iN 8S 
produced by the difference in the two administrations 0! the p- ele cog mer! 
eliminated. Therefore, a quick estimate of the reliability i Mane, : gael eee 
Fruchter (1973, 409) have described it as on-the-spot reliability. The me ‘s he 
both the sets of scores are obtained on one occasion, eae a Nesta the 
temporary conditions within the examinee as well as oe ‘oma oe ee 
external environment will operate in one direction, that is, either fav a far i aap 
obvious result of which would be either an enhancement oF aaa sem witttaenced int 
reliability. Another demerit of the split-half method is that it shou ! 


6H Lin 
4 64 dcbone 
Measurements and tewarch Methods in tecauienural Selencee 


Hew | ’ | / 
diffe ms On eliability coficient is overestimated. A test can be divided inn, Halve th 
ree vir methods and, it has been found that each method yields 4 differers , SHA, her 
Nability , | eo eghit aC TINIE Of Guay: , 
veel i a Undoutnedly, this i4 ArveAbwer weakness af thie? oplit-half ler ririiejuse ‘yf otinnag,, “4 
“aryl ity C4 wel {ie jerril rf P| Pa | % co 


Kulon and Hanagan Formulas 


AN estirniate of internal consistency is alw» made through the Rulon formula and the Far 
fareria| 4. Both these formulas provide the liability of thee whiole test (that is tral tery . 1, 
Not thie half lest, and both foreniulas estinnate the reliability comficient on the haa A Opp, | 
men VATIANICE IN total variance of the tout, The lewer the error variance, thee higgheey thes 
Vanlanice and therefore, the Higher will be the reliability, ° 
' The use of the Rulon forriala requires that the lest should be divided into Wve equal halve. 
either through the cdd-even method or by any other method. Thus each exarnirice yy ald fie 
One subtotal seore on odd numbered items and another subtotal score On ever: nutmibered Hee 
A simple difference tetyee 1 the two subtest scores would indicate the error of me, oUrerrveny ; 
| anc error of each examinee and thus, would give ani idea of the error variance, The Bisley, 
Ornrula is: 


(5-6) 


where fy) » reliability coefficient: a, = variance (that is, sandard deviation squared) of the 
difference between two hall scores for each examines; and o « variance for square of the 


vlandard deviation) of the total seore, Total score for an examinee is the surn of his scores on the 
Wo halves of the test, Corputation of reliability coefficient by the Rulon formula 1 illustrated ip 
lable 5.4 in whieh an odd-even subscore of examinees on a L2-lem test has been Piven, 


Table 5.4 Worksheet for the computation of reliability by the Rulon formula and 
the Flanagan formula 


———————————————————————— LLL a 
4 j 4 \ 0 - 4 


a 


| 2 





No, (ld | ven Dut ference dl? 3 x! 
of Xo x, Xi =X, 
Lxaminees ) igus « | 

: / ' fs , M 121 sh 25 
\ 2 d 0 () 4 lf, 4 ; 
A \ 2 +1 | 5 06 ) ; 
( 4 ! 41 | 7 49 ™ ‘ 
4 () 
If lo 
$6 16 
25 16 
25 Ih 








UN 207 EN? 17 





_ ae 





f at Mtr dé Py ay 


“@ rj y dad? re 

1 . 

i 
Fy # Vey) # rf Pe 
Vf) #) 
| 

if ; 

It a) $6 GF 


y= Obl «O44 
| 


ee Ny WEA” thw Y 


| P 
JSlfiwora 135, 
() / VE) /44,) (rh 


4 
I2L55T by mw 4H, 
% “(9264 10627 = 1043 
Now, reliability coefficient by Equation 5.4 is 
7 eee 
10h4 . 


snagan formula for estirnating reliahiling i . —_ 
The Flanagan (0 < ie ; ang, ve liability is very similar to the Rulon formula. In the 
ar formula, variance Of the score of Odd-nurmibered terns and the ocore of the 


Hanae” { iterns are calculated separately ; 
aapen-nurniberet lens df€ Ce | a 4 “Dar ale ¥ ari¢d then, an estirnate ef effort varianer 15 rahe. 
thus, like the Rulon formula, it is not hase upon the difference of the tye bol ne The 


Flanagan formula 1%: 
at (5.7) 
i = 


where fy = reliability coefficient and of = variance of scores of the first half: oj, = variance of 
scores of the second half; and of = variance of the total score. The computation of the reliability 
enefficient by Equation 5.7 from the data in Table 5.3 may be illustrated as follows: 


| 
9, of scores of the odd-numbered itens = re |NEX 4 -(EX,) 
' lane Aart ._' faava cit 
= J(10)(207) -(43)° = — J2070 1849 = 1.49 
10 10 
of = (149) =22201=222 
o, of the scores of even-numbered items = N JNEX (SKY 
i | j _ —— ma s 
= Jio(i71)- 07% = — V1710-1369 = 185 
10 10 
of, = (185) = 3.4225 =3.42 
Now, rf, according to Equation 5.7 is given by 
of 222+ oe 
Wee 1063 


2 i = = | = 0939 =0.94 
10.63 


— 





shapronnal SCICHCES 


a it Tiethility 91 
i Aes fs Me fia 100 fat 
moth ae FT 


; Heanor 
90 tee. ; | hy a oe) ye - 
Flanagan formula have yielded the same —— 


a and the i ‘inn CHfic 
on formula which automatically checks the ACCUrae Hen a the 


Thus, the Rul 3 of Table 5.4, 


reliability from the aal 


' Other symbols ATE defined as usual. Statistic ally, p = M, ancl q a[f= M, \ (or 
teste n n ) 


: j 4 A mean ¢ ‘total score ¢ . ma) 
two formulas are also applicable to the computation of reliabili, i Th ie" 5) where Mi, ds-ine mean of the total score and n is the number of items. K-R 21 may 
- age [TW a pel | i .c : - 
computation. de 9c rest, The advantage ot these two formulas ower the Spearms fi he ply , y fified 45 
alternate forms ol - -alculate the reliability coefficrent of the half-test SC Ores Bro, Sr SiMP nes’ —~M\n—M,) 
| is that one need NOt C< " N ls? (See ot 
formula Is tt ain 1) 15.10) 
, | icient alpha EME 
Kuder-Richardson Formulas 9 wi ( a to remove some of the diff 
po con (1937) did a series of res Te t se nae © Otic uly , Soe a . ‘ m ; 
aie age timating reliability, They were dissatisfied with split-hal Ities Of the .< the mean of the total score ate test and other symbols are defined like those in 
eofit-half methoa of es sieiiabinai : oe Meth.) ay WE —R 20 an revea | | ‘ : 
am reliability and therefore, they devised their own formulas for estimating the ; Od 9) where Wh 5.8 Asimple otal Fe ; fs oe both these formulas are based upon 
stimating re Y ' oe Iti jon ae = . However, Equation 5.10 (a modifi donot We. Vie has 
cusaeey of the test. Their formulas 20 and 21 have become a Popular and Well nal equal istics like p, Pr 4 ee q | aa total oe oe S s Bai) ts based 
K. R,. is the basic formula for computing the reliability coefficient and K-R.,, js the is Wn, ire St fapal score SALSUCS UKE Mealy Or Wie tka score and standard deviation of the total 
f ‘ital K-R The main requirements for the use of the K-R formulas are: veg upon : yation 9-! 0 is very useful in estimating the reliability coefficient of the test in advance, 
a] it : : Eq ' 
_ : ore: —k . — : -_ i 
(i) All items of the test should be homogeneous, that is, each item should MEASUTE the « able 5.5 ‘Jlustrates the use of K-R;, and K Ry, where the tem analysis worksheet showing 
factor or factors in the same proportion. In other words, the test should have iin ame it core and total score of 10 ee on a \5-item test has been given. (Correct 
Se cn. ms “It item = ‘ven asc ) i 
consistency, which is indicated by high inter-item correlation. Thus, the test should ‘ the a has been given a score of | and incorrect response a score of zero.) 
unifactor one, respo _=X 80 _ 
=== 
3 ae n 1 
li) Items should be scored either as +1 or 0, that is, all correct answers should be sc e 


Ore as 


+1 and all incorrect answers should be scored as zero. 


| 9 a! lt. . 
“clibescs, space sa SD =—¥NEX?-(ExX¥ =— ¥(10)(796)-(g0)? = 395 
lili) For K-R,,, items should not vary much in their indices of difficulty and for KR... al " is 
items should be of the same difficulty value. If the indices of difficulty of items are No 
equal, the value of reliability yielded by K-R,, would be substantially lowered from thay » of =B95) =15.6025 =1560 


computed from K-R 55. K-R 99 is: 








(2 pa Me 8 95933 
“ | 2 | & 49 nm 45 
ea 9 
, oy 
| =~ AM S36 7 
where KR,, = reliability coefficient by K-R 20; n= number of items in the tes 2 ae a TS = 15 =(0.4666 


co; = variance of scores on the test; p = proportion of correct answer to each item: 





q = proportion of incorrect answer to each item; hence it is equal to 1— p, Now, the reliability coefficient by KR 44 (that is, with Equation 5.8) will be: 
K — R39 requires that the investigator must have the item analysis worksheet ready before him a (| 15.60 - 38?) = S\ 298 \ 
and only then can he compute the reliability coefficient. This is because the lormula requires a © MI5-1 1560 VaiXl 560) 
knowledge of the diff “ 


iculty value (proportion of correct answer) of each 
computation of the reliability coefficient through K-R , involves a considerable amount of work, = (10710832) = 0891 =089 
Therefore, another formula has been Suggested in which Zpq has been substituted by other terms 
The use of K-R,, does not demand the item analysis worksheet. The information needed ” 
K—R,, is the mean of total 


test score, SD of the total score and number of items in the test. The ly = | be \ 19.60 ~(15)(0.53)( = 2 tae 9.60 oa 


Item. As such, the 


The reliability coefficient from the same data by K-R,, (that is, with Equation 5.9) will be 


K—R,, formula is, in fact, 


SSS 


the simplification of K-R 59 arrived at by assuming that all items are of Jas - \ : 





he's: difficul : 5 iy 15.60 15.60 
the same difficulty value. This is known as formula 21 which is expressed as: (15)(11943 
> = = 283) =(1071)(0.765) =0819 =082 
KR), = [2 ws HE | | : 
AT) o (5.9) 


Asimple approximation of K-R 


ie », is done through Equation 5.10, which estimates the reli 
Coefficient as follows: 


It is obvious that Spg of KR 
7! KR, has been subst; — 
average ot Correct Proportion to Richi Su stituted for Dein KR. ieee | to the 
4 t Ta : 9). P ts equd = —— 
win and q is the average of the Incorrect proportion to each hh = Seep =85 8} BY). 8 =0815 =082 
| 15.6(¥15 —1) (1ceni1a\) 1084 ° °@ «= 2 
, 6hOUCtS 





0 ES Ea cra Nesea rch Methords 


IN Behatioy ral Scien ces 





It is obvious that if the palates: | | 

eos ee bie the reliability Coefficient is computed from the same data by 

Side ' mu 4 MPpl€ approximation formula, K-R,, always underestimates 
dents should note carefully that data in Table 5.5 do not fulfil 

K-R formulas because items dj 


computed from this data it wo 


K-R 
the coeff 
1 the exact requirement, 
Ner widely in difficulty value. If the split-half reliability we 


, saat uld, perhaps, be larger than the coefficient obtained by k- 
underestimation of coefficient 


20 and 
‘Clent, 
of the 
Fe to be 


sal 9 PY K-Roo The 
by K-R ,) (when compared to split-half reliability) would be duet 


ifficulty values of all the 15 items. 
The Kuder-Richardson formulas have such wide applicability in estimating the reliability 
coefficient of a test that many researches have been conducted to modify the formulas so tha 
they may be used even in those situations where the requirement for the use of the formulas is 
least satisfied, Dressel (1940) has modified the K-R formula to apply to even those situation; 
where items are differently scored. As we know, one of the requirements of the K-R formulas jg 
that correct items should be given a score of +1 and incorrect items should be given a score of 
zero. Dressel’s modified K-R formulas can be applied even to those situations where the correct 
answers are given different scores such as +2, +3, +1, efc., and wrong answers are given a score 
of zero. Dressel’s formula is as follows: 


fo49 = } } 
fon \io7-Zwers 
a 


the wider differences in 





(5.11) 
6; 


where W =the weight or score assigned to each correct answer of the item; and other symbols are 
defined as those in Equation 5.8. 


Likewise, Ferguson (1951) has further extended the K-R formulas to apply to situations 
where each item has more than two scores, e.g., +1, 0, -1, etc. Tucker (1949) has also provided 2 
modification of K-R 20. The modified formula is very similar to K-R 21 with one additional term. 


; -( n | sf -ePa +e (5.12) 
"(n= o; 





j 


, “tame which is statistically 
where oO; = variance of the proportion of correct answers to all items, which is statisticall\ 
equal to: 





Reliability 93 


et 
P 
aaa | 


(5.13) 
ther symbols in Equations 5.12 and 5.13 are def 
Oo 


all —— ined as those in Equation 5.9. 
KR formulas have some limitations too. If the K- 

vere each fee ee different functions or traits, they will underestimate the actual 

er nt of reliability. Likewise, i they are used in a test where items differ widely in their 
coeflicr gifficulty, they would again yield a lower coefficient of reliability. Finally, with speed 
A exes st which is very close to a speed test, the K-R formulas should not be used. 
r 


ate | - 

thematically, it has been shown that K-R reliability coefficient is actually the mean of all 
i coefficients resulting trom the different splittings of a test (Cronbach 1951). However, 
“ect true when the split-half coefficients are estimated by Rulon formula and not when 
stric 


found by correlation of halves and Spearman-Brown formula (Novick & Lewis 1967). 
“hey ciel of error variable in K-R coefficients is content sampling. 
suit 


R formulas are used in a heterogeneous 


this 


efficient alpha: The Kuder-Richardson formulas are applicable to the tests whose items 
cor das oor! (right or wrong) according to some other all-or-none system. Some tests, 
BC ay have multiple-scored items. We often see that on a personality inventory, testee or 
ia oo ce a different numerical score on an item, depending upon whether he checks 
5 ondent | ei ‘rarely’ or ‘never’. For calculating reliability of such tests, a generalized 
igometiMe> the as Coefficient alpha or a (also called Cronbach's alpha) has been formulated 
senna 951 - Kaiser & Michael 1975). It simply means that coefficient alpha estimates the 
ronda 


| consistency of the test in which items are not scored as 0 or 1 (right or wrong). The 
mad 


om coefficient aipha is 


_ mula for : : 4 

ew ( it oe Rigi 5.14 
r can) ote) (5.14) 
' \n-1) of 


Here meaning of the various terms of the formula are the same as those of Formula 5.8, 
ste? ). Formula 5.14 appears quite similar to K-R,,) formula. However, the difference IS 

except ~L0 ; been replaced by £(; ), the sum of variances of item scores, The procedure is to find 
that2pq ov en of all individual scores on each item and then to add these variances across all 
Gut Me pein ians today regard the coefficient alpha as the most general method of finding 
seseii ea oe ability through internal consistency. In fact, Cronbach’s alpha tells us the overall 
SANTEE : ° the measure, that is, how much high responses go with highs and law responses 
consistency = the items in the measure. In general, in psychological and educational research 
_ = eons choiild have a Cronbach's alpha of at least .60 and preferbly closer “a 90. 
saemanically Cronbach's alpha is the equivalent of average of alt pose sti 
coefficients of the test. It ranges from zero (zero internal consistency} e t rvaeonto 
consistency) provided certain assumptions are met. A — ale Sa enilll has beer 
the items of the test are negatively correlated and that an inappropriate fe i ili ha sat “_ 
used. The sources of error variance in coefficient alpha are content samp 8 
heterogeneity. 


Alternate-Forms Reliability | ailel-forms reliabili 
Alternate-forms reliability is known by various name> such ih acon veliabil 
equivalent-forms reliability and the comparable-lorms reliability. ables equivalent. T 
requires that the test be developed in two forms, which should be meee the same day OF “6 
forms of the test are administered to the same sample, either immectately 


Relwiilcy 95 


Mt. When the reliahil I cabilitv, intemal consistency reliability and allei4 lability ess 
f 2 YY on : reliability is calculated on ny. fiability. : ESIENC) os paralleitorms reliability expr 
enmediate) feliability eit: basis Of two administrations of the test. it is Called Pan OF 4. pret ae of the correlation coefficient. As we know, the correlation coefiicient is always 
SNSt 2 22p of a fortnight. ie i when the reliability is calculated on the basis of the Pian Tt abilit’ " reliability obtained by the said methods is, therefore, also known as the relative 
“TOF variance js coma, — emate-torm (delaved) reliability. In the forme, hen ie “esti sometimes analysis of variance (ANOVA) is also applied as a measure of relative 
<mnphing. Content ~ -=ipling, whereas in the latter. the sources of Variance oe opie Dalits: Iysis Of variance technique has been used by Hoyt (1941), Jackson (1939) and 
CMained frp | sampling 2nd content hete . Pearson r between ae ay ‘yjability- ¢ os 7). In applying the analysis of variance technique to individual items and 
known as the _ tiwalemt forms hecomes'the amaase of reliability. Such es OF Stn — \ pe error variance thereupon, Hoyt has made four assumptions. The first assumption 
the — SOMCient Of equivaler Alen 6 . reliability measures the CORFiCge, jeerminin | score of an examinee on 2 test can be divided into tour independent components: 
time inten a eres betwees I two adininictrations i parallel forms of a single t 2 Sten, _ ghar the wu which is Common to ail examinees and to all items of the test: (b) a component 
ee, ) the admin; : ea oe ee = Avery © é | only; (c) a Component associated with the examinees only; and (d) the error 
Mainkaining the = administrations of the two forms may help the exam: (2) 3 ped with items on" 


tlon on the second form. This would apparently contribute 


Variance and , to im 
would tend {6 raise the Coefficient. Drastic changes in the content Of the ; the ae 


ia independent of the first three factors. The second assumption is that the variance of 


yorent of each item is equal. The third assumption is that the error component for 
Cc 2 


of the test “tribute to the error Variance and will tend to reduce the reliabilin, coef the vem is SYTM ee my coebhameiatnn oF nea para ae aimaass os 
The seitiens — time interval is likely to produce demerits similar to test-retes, relia Co items IS independent. Hoyt’s tormula tor the reliability coefficient is: 
© eficc ; — eel : é 
as ign memory effect including recall of items which occur in test-reteg reliabiye. dist! fe =1- ve (5.15) 
Parallel ieensian — f€nMis of items in the two administrations are automatically Controlled i‘ 
Of the test b. mi : a identical tot ons _ | 
eee est because the second form ic camila Beit not iden ii to the first form, _ reliability coefficient: V, = error variance; and V, = variance among examinees. 
true \ ae Problem IN parallel torms test is how to make both the forms €quivalen; I the itt shicet demonstrated by many research workers that Equation 5.15 yields a reliability 
sénse (because. if the test is NOt equivalent, the rejiability Coemmicient may not be based ee 


oat which becomes identical with that obtained from K-R ,,. And, therefore, some of the 

oe sie of K-R ~ automatically apply to Hoyt’s formula. For example, Hoyt's tormula should 
3S lites ae eal in a test where speed is an important factor. The analysis of variance approach for 
not be g the reliability coefficient can also be applied to data obtained from alternate forms of 


the true variances). Gulliksen (1950) has defined parallel tests (or equivalent tests) as testshanss 
equal Means, equal variances and equal inter-item correlations. Freeman (1962, 72) , ay 
the lollowing criteria for judging whether or not the nwo forms of the test are parallel, 


oi un ; “ =- oo ; lat re. 
1. The number of items in both the forms should be the same. ust and retest administrations of the single torm of a test. 
2. items in both the forms should have uniformity regarding the content, the range oj | TISA SATISFACTORY SIZE FOR THE RELIABILITY COEFFICIENT? 
primis, and the adequacy of Samnesing, | — > hard and tast rule to determine the satistactory level ot the reliability Coetticient of a 
3. Distribution of the indexes of ditficulty of items in both should be similar. There 5 ne 
4 


- tems in both the forms should be of equal degree of homogeneity, which can be shown 
either by inter-item correlation or by correlating each item with subtest scores or with 
total test scores. 


5. Means and standard deviations of both the forms should be equal or nearly so, 
6. Mode of administration and scoring of both the forms should be uniform. 


rest. In fact. whether the obtained coefficient of Correlation is satisfactory or not, is judged by the 

nose for which the test is given to the examinees. If the purpose is to segregate superiors and 
inferiors (that is, to make individual diagnosis), as is true with intelligence tests, aptitude tests and 
achievement tests, the reliability Coefficient of 0.90 or higher is regarded as the best coefficient. 
Likewise, where the purpose ot the test is to Compare the means of the two groups of narrow 


range, 2 reliability coetticrent of 0.50 or 0.60 would suffice. 

One can thus visualise the difficulty involved in making the two forms of a test Parallel High levels of reliability. in general, are most necessary when (a) tests are administered for 
because itis very difficult, if not impossible, to meet all the above criteria. Moreover, such a test taking a final decision about people (b) persons are categorized upon relatively small individual 
involves too much labour and time because all items in a changed language are written twice for differences. For example, tests are often used for placement in the armed forces. If the tests used 


the two separate forms. A test int onder to be called a parallel test must meet the above criteria very in the placements are unreliable, the decisions made regarding thousands of persons will also be 
closely, if not perfectly. Alternate-jiorms reliability is most appropriate when a speed test is being nreliable 
‘ able. 


constructed. But this does not mean that it cannot be applied to a power test, : : 
a Lower levels of reliability are acceptable when 
Scorer Reliability 
There are tests such as tests of creativity and projective tests of personality which leave a lot to the 
judgement of the scorer. In fact, such tests are in as much need of scorer reliability as there is for 
the more usual reliability coemicients. Scorer reliability is the reliability which can be estimated 
pr Sete, : saepie of test independently scored by two or more examiners. The two sets of scores 
obtain y each examiner are completed in the usual way d th sulti “oO i 
coelicleet! ikaw. 3 Oe ees ‘ y and the resulting correlation 
-oeticien wn as scorer reliability. This type of reliability is ets 
subjectively scored t ‘ Hity Is needed specially when 
SSeS hte ests are employed in research. The source of error vari ins 


ia) lests are used for preliminary rather than final decisions, and 

ib) tests are used tor categorizing people depending upon gross individual differences. 

There are sometimes practical and theoretical obstacles in maximizing the reliability of tests. 
The practical obstacles may be exemplitied with a fact that a reliability of 0.90 might be twice 
expensive to develop in comparison to a test with a reliability of 0.85. Thus the gain in reliability 
might not be worth the cost incurred. There are also some theoretical reasons why it is impossible 
to achieve high levels of reliability in measuring a particular trait/ability. In fact, there is an 
inevitable conflict between the goal of attaining precision (reliability) in measurement and the 
Boal of attaining the breadth. This is known as Bandwiath-Fidelity dilemma (Cronbach & Gleser 





96 Yivts, Measnrement and Research Methods in Behaniounral Nc terners 
1965; Shannon & Weaver 1949). These two terms, that is, Bandwidth ane Fiche) 
taken from communication theory, Bandwidth reters to the amount of INfOrMation | 
message, While fidelity refers to the accuracy with which the INTOFMation Is Passed Mt 
the greater the amount of information to be « onveyed (bandwidth) the lesser the level / Ren, 
desired, and less information can be conveyed. Ina similar pe bandwidth ang vt Me, 
apply to the field of testing. For any test, the researc her has to decide between mene, elity i: 
spec ihe attribute wath high degree of accuracy OF MedaSUFINE broader altribute wilt ~ ha " 


’ 
of accuracy. For example, suppose the researc her constructs a 90-ilem lest of hists ! RSet 
the knowledge of history in the 15th century. On the other hand, he constructs the Say d Seyi, 
lest for assessing knowledge of history from ancient to modern period. In the former hit 
would be a highly reliable measure because bandwidth is limited and fidelity a they. 
whereas in the latter case, the test would be able to measure each student's Beneral kno) hip 


: ‘ “ii rh i i i 
history but would not be able to obtain a very accurate or reliable measure (poor fide Be 
Ch 


a broadly detined area (greater bandwidth), 


Vy, l 


Tal 


Tate 


Ow| 
lity) 


STANDARD ERROR OF MEASUREMENT 

So far, we have discussed only relative reliability which is expressed in terms of the COeffi¢j 

correlation, Many researchers prefer to express the reliability of test scores jn terms g | 

statistic Called standard error of measurement rather than in terms of the reliability COefficja 

we know, the reliability coefficient is affected by the variability in the range of scores, The an 
the variability in the total scores, the higher the standard deviation and therefore, the isha 
reliability coefficient. A homogeneous set of test scores tends to produce lower reliability, One 
the features of standard error of measurement is that it is not influenced by the Variability nit 
range of scores. Standard error of measurement is almost the same even when the total SCOres he 
sample vary widely. Statistically, standard error of measurement is defined as the standang 
deviation of the error component (or score) in the obtained test scores, Standard Error of 
measurement may also be defined from the angle of true score and then, standard error 
measurement becomes standard deviation of a sa mple of scores of an examinee about His true 
score. For calculating standard error of measurement, one must know either the error score or the 
true score of each examinee. If one can find out the error score (or error of measu rement) of each 
examinee’s test score, it is possible to calculate the standard deviation of the error score and the 
resulting statistic would be known as the standard error of measurement. But there is no way of 
calculating the error score directly and as such the standard error of measurement Cannot be 
calculated directly. If the same test is repeated to the same group of examinees on unlimited 
number of times, the means of all such obtained scores for each examinee will be their true 
scores. Subsequently, the standard deviation of these scores (on that test) could be calculated and 
the resulting statistic would be known as the standard error of measurement. But again the 
problem is that it is not possible for anyone to repeal the same test on the same group of 
examinees on unlimited number of times, and hence the true score cannot be determined and 
therefore, the standard error of measurement cannot be statistically determined. 

Standard error of measurement is calculated indirectly on the basis of the standard deviation 

of the test scores and the reliability of the test. The formula for standard error of measurement is: 


SEneas = Oy 1-12 (5.16) 
where SEmeas a“ standard error of measurement; o = standard deviation of the test scores; and 
fr =the reliability coefficient of the test calculated by any method of estimating the reliability. 
To illustrate, suppose, the reliability coefficient of a test is 0.90 and the standard deviation ol 
the test is 10, then substituting them in Equation 5.16, the SE,,,.. =10./1—(090) 


=Thi1= = 119 — , ; 
v1-081 10V0.19 = (10)(0.435) = 4.35. Now, suppose that an examinee’s obtained score on 





oT 


Peltability 


I nerpreting this score, it can he said that chances are 95 in 100 (5% level of 
is ane this obtained score 16 not more (or less) than 4.54 unite!4.35 « 196) from his true 
rent ance) vile 5 scores Oul of 100 scores (if the sare examinee is given the same test 100 
on Sat less than 91.47 OF greater than 68.54. Similarly, the chances are 99 wm VOO 
ae ifte writ e) that the obtained score will not he more lor less) than V122(445 * 2.58) 
el ot sign ‘ score. Thus standard error af Medsureruent provides a direct indication of the 
from his sail at test scores. In all these cases, the true score is regarded as the mean ota 
uni jute Bee ee of scores of the same individual on the same test. 
abe ated ame coefficient of a test is 1.00 (which is rare), the standard error of measurement 
aise the right side of the equation ts reduced to zero (cf Equation 5.16). It should 
becaus® 


“at 
ifthe re | 
smaller the standard error of measurement, the more reliable or consistent are 


it be zere 


wil ed that the 
‘a rest SCOFES- 
the OF SPEED TEST 


The 


phis 


ABILITY | between speed and power test is difficult to be drawn in actual practice. In fact, 
die oe one of degree and most tests depend upon both power and speed in varying, 
distinc 
ve trial reliability coefficients like those found by Odd-even or Kuder—Richardson 
are inapplicable to speed tests. In speed test the individual differences in test scores 
iques are on the speed of performances and as such, the reliability coefficients tound by 
pencen sell be spuriously inflated. This can well be illustrated through an example: 
thes® ecorai se eer test is completely based on speed so that individual differences in scores are 
suppose i on the number of items attempted, rather than upon any other kinds of errors. In 
based ne i n if A obtains a score of 80, he obviously will have 40 correct odd items and 40 
’ ene Likewise, individual B with a score of 50, will have odd and even scores of 25 
ctively. Consequently, the correlation between odd and even scores would be 
0 which will be entirely spurious and provides no correct iniormation about 


guch 
correct even 
and 25 ee 
erfect or + 1.0 
reliability. ee ene ere eee eee ilable for estimating the reliability of a 
Now the question arises: What alternatives are available for es ena ing oa: i 
d test? As an answer, some suggestions can be made like this: Test-retest met od, 
caperest ble, can be applied. Equivalent-form reliability can also be properly employed. Split-half 
applica ati an also be applied, provided the split is made in terms of time rather than in terms of 
call eae & Urbina 2002). In other words, such a split must be based upon separately timed 
ae the test. The obvious way of splitting such test is to administer two equivalent halves at 
he test with separate time limits. For example, all odd items and all even items may ‘i printed ‘i 
separate pages and each set of items be given one-half the time limit of atihete bi 
Subsequently, either the Spearman—Brown prophecy formula or some other appropriate formula 
can be used to find out the reliability of the whole test. . 
lf due to any reason it is not feasible to administer two half-tests separately, an alternative 
technique is to divide the total time into quarters and to find out a score for each of the utr 
quarters. Subsequently, the number of items correctly completed within the first and fourth 
quarters can then be combined to represent one half-score while those in the second and third 
quarters may be combined to yield the other half-score. Such a combination of quarters also 
tends to balance the cumulative effects of practice, fatigue and other factors. This method is 


especially satisfactory when items don’t differ sharply in difficulty level. 


FACTORS INFLUENCING RELIABILITY OF TEST SCORES 
The reliability of test scores is influenced by a large number of factors and all these factors can be 
Categorized under two heads: extrinsic and intrinsic. Extrinsic factors are those factors which lie 


OH 
Fempy i 
| COS Pe, 
fy eaned fe 
 NESIPC A 


j ti Niy 
ds tr yy 
f Ht ; side ti \ vi iy, havionral Ne jences 


[EST itsal 
(he range of nell 


Mure lWalienys a rm : 
hema i h the INdividual etc eva 
the lest Oe act "5 whic 
cone, . larac leristics Of ite ene 

© €xamples Of intring; 
Extrinsic Factors 


Important extrinsic fac 


ac reliable or unreliable. For example, vr: 
“' COnditions, guessing by the exXaMines 
ute examples of extrinsic factors. Intrinsic meta 
Ms, total « lie Within the test itself and influence the reign Oy, 
fact core, length of the test are SOME OF the fac, ity 
mS: A detailed discussion of both these factors follows "iy 


for ; 
i‘ Group viability ue the reliability of atest may be erates as follows: 
Sivas BTOup of examinees being tested 1s homogeneous in 
2 | ikely to be lowered. But when the examinees Vary wi 
lest scores js likely to be hig the group of examinees is a heterogeneous one, the reliability “ein 
E high. The effect of variability on reliability can be examined }, a. 
score and therefore, standa 4 : Sizer, In SUCH an extesne Case sh ae one receive the sant 
48 Z score of zero. Sis cx Fnasen erolgeeariet ae eee ‘) 3 > dein yl BFOUp ang 
product of 7 sitices ih oie ations (and the rel iability coefficients ite . ‘ined as the aVerap, 
zero. Only when ee Pro uct ofz scores and the reliability conn . ae situation are by 
% iia b 's some variability in the group, correlation and reliability are possibi, 
5 : sSINg y the examinees: Guessing in a test is an important source Of unreliability In 
ei: Nallv€ response options, there is a 50% chance of answering the items Correctly on th 
asI§ of the Buess. In multiple-choice items, the chances of getting the answer Correct purely fe 
Guessing are reduced, Guessing has two important effects upon the total test scores, First, it tends 
lo raise the total score and thereby makes the reliability coefficient spuriously high, Second 
Buessing contributes to the measurement error since the examinees differ in exercising their luck 
over guessing the correct answer, Two examinees guessing on 60 “Yes-No’ items out of 109, May 
differ—one examinee may get 80% of the 60 items correct purely by guessing and another 
examinee may get only 30% of 60 items correct purely by guessing. Such differences would 
make a difference in the obtained score by contributing more to the error score (or error of 
measurement) than to the true score, The higher the contribution to the error score, the lower is 
the reliability of the test, 
}, Environmental conditions: As far as possible, the testing environment should be 
uniform, Arrangement should be such that light, sound, and other comforts are equal and 
uniform to all the examinees, otherwise it will tend to lower the reliability of the test scores. 


4. Mamentary fluctuations in the examinee: Momentary fluctuations influence the as 
score sometimes by raising the score and sometimes by lowering it Accordingly, they tend to 
affect the reliability, A broken pencil, momentary distraction by the sudden sound of an 
seroplane flying above, anxiely regarding noncompletion of homework, mistake jn giving the 
wee and Knowing iid way lo change il, are some of the factors which explain momentar 
Huctuations inthe examinee, nia 





Intrinsic Factors 
Fhe main intrinsic factors affecting the reliability of a test are ax follows: 


1. Length of the test A longer fest tends to yield hig 
shorter test, Lengthening the test or averaping total a ‘ : é 
the same fest lends to increase the reliability I h ; aa 
Par a oe yee i : , J call st 
psc applications essentially gives the SaMe resull ay increas ihe. 

» SUPP Ose the lest scores are i eb increasing the lenpth of the test, 
the testis lengthened three tine: the B averaged after its three repeated application | 
result of both Averaging: and ler ih Present length and HdMinistered once then st ni cl 

Athening will be the same. ¢ iredteay arias, it ne 1s a _ 
UBETIO se6 (hal addex 


ther reliability coefficient than a 
Obtained from several repetitions of 
n demonstrated that averaging the test 






reliability 99 


f the original 
) rrelations as items © Sik 
riance and the same mie oe 


inter-item co : 
. ultiple (t ‘ 
ealed or deceased mbt cage the reliability of the test. 





ha heen incr ; 
ns ould rest nas rman Brown formula may be used to estimate 
te when th) the 
er al bait (A Fy) (5.17) 
oor ‘nn T+ (n— Wy _ 
i the test has Dee 
coefficient of lengthened test; n= number of times | 
jability nici he original test. | 
23 reliabil” ay coefficient of the orig | cp atpucstnessnpbinion Gk OBR ME 
pere (0. 2 reliabiiy an intelligence test of 100 items has a reliability ach? sata HOW | 
inc! grate) eats its present length, that ts, 300 more items are acce i 
| i | . . - e 1 
ite creased four its reliability by Equation 5.17 becomes: 
| 


z or : 
st #5 in 400 items, 
the vast becom (4080) _ ” 
a fon = 74 (4-1080) 1+3x0B0 1424 3.4 


length four times the original or repeating t 
ases the reliability coefficient from 0.80 to 0.94. T ! 
e in length, though not in the same proportion. Inc 


32 a2 wea \\ 





he same test on the —< 
the same 

i es ne hus the reliability 
ines ot ith a ase the tota 
ith an increas 


examinee? 4ses sil 100 to 50, n will be 1/2. 
were id items 1° ag oe formula through which it ts possible to estimate pe 
me nei is aN ied ro achieve a given level of the reliability coefficient. Again, the lengt 
length of sha ae caienated in any multiple (that is, 2, 3,4, or 5 times). The equation ts: 
of the test can a4 oe ui 1) (5.18) 
ry —Trin) 
mber of any multiple by which the test is to be lengthened; r,,,, = level of reliability 
where 1 = ne sired: fy = reliability of the existing test. . 
coefficient de the reliability of an intelligence test is 0.60. For how much time should 


te, suppose : Sai is Uae ? h A oIVeEN 
hened in order to reach a reliability coefficient of 0.90? The answer can be £1 


tion given in Equation 5.18 as mentioned below: 
_ 0901-060) _(090)0.40) _ 036 _, 
"0.600 —090) (060)0.10) 006 


when nis in fraction, tt should be rounded to the next whole number such as 23 = 3, and so on.) 

| Thus, in order to achieve a reliability coefficient of 0.90, the present test should be 
lengthened six times its present length. The use of both the above Spearman—Brown formulas 
makes two assumptions. The first assumption is that the new items added to the original test must 
have the same statistical properties as the iterns of the original test. In other words, the added item 
must have the same averape difficulty value and it must have the sare inter-iterm correlation 
among the items. The second assumption is that the added iterns must not influence the 


To jllustra 
lengl! 


the test he 
equa 


py solving the 


eXAaMINees’ FESPONSe, 
According to Spearman—Brown formula, if a test initially has a reliability of 0.50, doubling 
he number of iterns increases the coefficient of reliability to 0.67. If the length of the test is again 
doubled, the estimated reliability increases to 0.80. Ebel (1979) has shown that doubling the 
length of a test quadruples true variance while it only doubles the error variance. Suppose that a 
a ol oraty items has an obtained variance of 40 and true variance of 20, and thus reliability by 
netted sme Be sen Rhee lpi ee sg tie Der ot tems is doubled. 
hele thes He ‘unl aie ormula | Equation ».16), the reliability for a sixty-item test would 
3 had a reliability of 0.50. According to Ebel, because the length of the 





ENS SSE PEO MAC UAL ScClente 
te -- ey 


This error variance only doubles, that j- . 
‘oe < = 120. Then reliability by Equation =o bao 40 us che 884g 7 ity, 
: : inn from Spearman-Brown tic OO 20 = 0.67 the < Wane. i = x 
ange of t SCE | 
sillier iise ts ‘ hein : ala rae obtained total scores ont i vu 
other hand, if the total scores on th pith “mong them, the reliabilj th ye i 
this fei <tatisticcs ines ee, "ik e test Vary widely, the reliability ~ . the tEst js low 0 lo. 
the reliability is alec hishvang on that when the sta ndard dey the test jg increa Ona 
when the standard deviation of the oun the total : Pi 
OFes is low th *S is hi 
3! lig h Bh 


Measure different func 
Eee NCUIONS and the jn . 
test 15s he € intercorrelati FF; | 
funct cae Beneous one), the reliability is zero rani ae are Zero or near it (that je © tem 
co om and when the inter-item correlation is igh hea | items measure en the 
‘ Whi ; : BE le reliability, a 
_— : culty Value of tems: In general, items aa | ity of the test is © Same 
it yield higher reliability th , tems having indexes of difficu| alse 
dre too easy or too nas Han items of extreme indexes of difficulty. In éithe vans - 
contribute to the ce ek the test yields very poor reliability (becau, os words, Ww i 
someti __~ laility}—poorer than when items are of SUCH items dec 
eumes such items could also be wholly indiscri € of moderate difficulty yay 
; aye : in j 
nothing to reliability (cf. Table 4.1) p NESET TIN ALOTY BNE then they would conti 
9. Discriminat; | a 
test Correlation is ikely to be he mee ; Sg ane of discriminati ng items, the ; 
| | ic liahiliny | ; is 8, IN€ itepp. 
do not discrimi IgM and then, the reliability is also like| to be hij M-total 
Criminate well betwee cog E 4 ¥ al igh. But Whe ; 
discrimination | nN superior and inferior, that is when ite items 
ae n values, the item-tota| correlation is affected cohiee. “om Ms have 
reliability of the test P ultimately attenuates the 
6. Scorer reliabilj aban: 
rf ity: Scorer reliability (al ar fae 
y (also known as reader reliability) is also an importary 


hich affects the reliability of the test. By scorer reliability is meant how Closely tw 
0 or 


factor w 
responses. If they do not agree, the 


more Scorers agree in scoring or rating the same set of 
reliability is likely to be lowered. 


HOW TO IMPROVE RELIABILITY OF TEST SCORES? 
Reliability of test scores can be im i 
Habi . proved by controlling those factors which ad | | 
reliability of the test. The following suggestions are useful for improving the ealiability: iii 
1. The group of examinees should be heterogeneous j | 
Ne gro . on , that is, th al | 
widely in their ability or trait being nieipoured a 
2. Iterns should be homogeneous. 


3. Test should preferably be a longer one. 
4. As far as possible, items should be of moderate difficulty values; in other words, the 
indexes of itern difficulty should have the range of 0.40-0.50-0.60. 

5. lterns should be discriminatory ones. 

, Apa a these general suggestions, there are three common approaches for improving the 

re ae ity'o the test. One approach emphasizes upon the length of the test: the second approach 

emphasizes upon throwing out items that pulls down the reliability; and the third approach 
emphasizes upon correction for attenuation. | 








Reliability 101 


roach emphasizing upon increasing the length of the test assumes that if new items 
gree original set of items are added, the reliability of the test would tend to increase. 
imilar t© domain-sampling model, each item in the test is an independent sample of the trait or 
2 | d. The larger the sample, the more likely the fact that the test will represent 
the number of 


callow!" ‘ag measure " 
b cteristic. According to this model, the rel lability of a test increases as 
e test should be lengthened in 


he 4P 


ye. = 
abil fa i = " i 
is OO com: Formula 5.18 is used to estimate the n times th 
jncre * the projected level of reliability. 


or ther approach to improve reliability is to discard the items that run down the reliability 
et approach, two techniques are commonly applied: factor analysis and item analysis. 
idert : gatiiiié reliability, it must be ensured that all items measure the same thing. Factor 
asures that tests are most reliable if they are unidimensional (Loehlin 1998; Tabachnick 
5906): This reflects that one factor should account for considerably more of the variance 
| other factor. Items that fail to load on this factor should be omitted or discarded. In item 
than any enerally the correlation between each item and the total score for the test is examined. 
analysis; B of item analysis is called discriminability analysis. When the correlation between a 
This gee and total test score is low, the item is probably measuring something different from 
single ie! 5 on the test. On the other hand, it may also mean that the item is so easy or so hard that 
ie does not differ in its response to it. In either case, the low correlation indicates that 


ather ite 
the ee s are pulling down the estimate of reliability and therefore, they should be excluded. 
= The third approach to enhance reliability is to go for correction for attenuation for which 


‘; referred to consult chapter 6. 


wee 
ede! to reac 


ni alys! 
, Fidel 


reader 
esTIMATION OF TRUE SCORES 


True score is statistically defined as the mean of an unlimited number of measurement or scores 
iacined by the same examinee on the same test, It is Practically impossible to take an unlimited 
ree of measurements or scores. As such, experts have devised an indirect method of 
estimating the true score of an examinee. When the reliability of the test is known, an examinee’s 
true score can be estimated by a regression equation. The equation is as follows: 
X. =X, += 1M, (5.19) 
where X,, = estimated true score; 4, = reliability coefficient of the test; X, = obtained score; and 


M, =mean of the test scores. 
The standard error of the estimated true score is given by the following formula: 


: = rs 
5E,. = Oy ln Cy 


where SE,, = standard error of the estimated true score; r,, =reliability coefficient of the test. 

To illustrate, suppose that Mohan has obtained a score of 50 on an intelligence test. The test 
has a reliability coefficient of 0.80 and the mean and the standard deviation of the test are 30 and 
10 respectively. What would the true score of Mohan bez By Equation 5.18, the estimated true 
score will be = (0.60)(50) + (1—0.80)(30) = 40 + (0.20)(30) = 40 + 6 = 46. Thus Mohan’s true score is 
46 and his obtained score is 50. The standard error of this estimated true score by Equation 5.20 is: 


SE, = 10/0.80-(0.80) = 10./(080 —0.64) = 4. 


The 0.95 confidence interval (or 5% level of significance) for the true score would be 
+196 x4=7,84 units above or below the estimated true score, that is, the true score will be 


between 53.84 and 38.16 (or 38-54 interval). Thus, only 5 scores in 100 will fall below 38.16 or 
erval (or 1% level of significance) for the true score 


above 53.84. Likewise, 0.99 confidence int 
would be + 2.58 x 4 = 1032 units above or below the estimated true score, that is, the true score 
will be somewhere between 56.32 and 35.68 (or in 35—56 interval). Chances are that 1 score out 


(5.20) 


of 100 will fall outside this range. In 0.95 Confidence int 
CrVva 


range of the limits are 16 units and 21 uni ane 7 
iis ge iets T units res : 0.99 conn 
reliability of the test is high (0.80), the estimation of fo” In the above = idence ; 
indicates that the expert's aftempt to estimate the tric ia Core has amp : nie Ny 
reliability coefficient of the test is not highly ‘ atisfactory, Core of an examin acs * Pop, . 
; es “C On the Thay Ne 
INDEX OF RELIABILITY . as oh 
In the previous section the tr y 
' ue score : 
of measurements made by the “ne On a test was defined as the Mean of 
same test. Theoretical| © Person on the same test or 9 AN Unf 
y, the Correlation betwee h " the equiy, len edt 
be perfect. But in reality this js ane n the obtained scores and the alent form Uh, 
défined as the Correlation i: *ception rather than the rule Index of re] try Biv. F the 
This statistic ind icates the exte te lent between the obtained scores and th bil V ig stay, uy 
n . IF tr otic 
true scores. Index of roliabiin to which we can depend upon obtaj saa un “Healy 


¥, thus, gives the Maximum correlation Which the © 2 Masta 
y 


yielding in its present f 
iabilj + orm. Index of reliability is statistical] , test j 
reliability Coefficient of themes 44 eliab ty is statis ically equal to the ts 5 ab} 
ence, its formula is Te tg tora” 
e 
— 
fia = Vln 


where =the j ; 5 ra 

ee. ndex of reliability: and fy =the reliability coefficient of the test 

| 3 eee 
pee atesthasa reliability coefficient of 0.65 and then its index of rey 

165 elia 


through Equation 53 
-21 becomes = V0.65 = 0806 = 0.81, which 
i ; = V70.65 = = ' means th fet 
Correlation which this test can Presently yield. ai hem 


RELIABILITY OF DIFFERENCE SCORE 


A resea l id wi ituati j 
in ‘ rcher is often faced with a situation in which he has to look into differences ; 
n : ~ * 2 © | 
(Wo tests. For example, in evaluating a training program, he may be teresa | 


xx +hyy 
‘np = —_« 


—Tyy 
= (5.23) 





where fy, = reliability of the difference between scores on X and scores on Y 
xx =teliability of x 
"vy =reliability of y 
"xy = Correlation between Nand ¥ 


A simple look at Fo ) a7 
etal a hich ct em reveals avery important fact, that is, all other things being 
To understand this apparent piiraddon, ie he Se . te in Feomtika Si al iad ec 
reflects two importan f Gace a © Know that the difference be y 
mt Phy és _ bate sa differences IN true scores a nd immo 
true-score part of ¥ = is ; the lrue-score Part of X overlaps considerably with the 
variables, Thus the difference, ‘ vere will hardly be any difference in true sco ss t 

es Delween scores on N and Y wi eee ONS — a 
ne will be due almost entirely to 


measl 


Aoliability 104 





Thus the more highly correlated X and ¥ are, the more reliable is their 


rement error 


rarence: 
airferé TY OF COMPOSITE SCORE 
geLiABit , score, or summed score, is one which is obtained by adding the scores on several 
compre. “Researches have shown that whereas difference scores are often unreliable, 
qiffere™ a ores are typically more reliable than the tests which make up the composite score . Mi 
ae aets are highly correlated, the reliability of the sur of those tests will be higher. The 


“O ‘ : j 
sg dividual ? a composite score is estimated by Formula 5.23. 





in ‘ling O 
‘ability _{ 
relia — k “Uke, um (5.23) 
1K + [ke —k);, 
= reliability of composite score 
where Ss 


ke = number of test 
7, =average test reliability 
7, =average correlation between tests 
ic principle for determining the reliability of a composite score is very much similar to 
The basi ree determining the internal consistency of a test. If more such tests are combined 
those involv . atencarctatlon is high, they are likely to yield higher reliability of the composite 
among we adding together scores on several highly correlated tests achieves exactly the same 


eis adding together scores on several positively correlated test items to form a single test score. 
a] as< 
goal a P 
MM Review Questions 


Give the technical or logical meaning of reliability. Discuss any two methods of 


= 


1. ' 4s = 
estimating reliability of test scores. 
2. What is meant by internal consistency reliability? Discuss any two methods of assessing 
internal consistency reliability. 
3, Discuss the relation between length of the test and reliability of the test. Is reliability also 
"influenced by difficulty value of the item? 
4. Discuss the factors that intrinsically affect the reliability of the test scores. 
5. Discuss the major criteria of alternate-forms test. How can reliability of an 
alternate-forms test be increased? 
6. Write short notes on the following: | 
(b) Homogeneity of test and its relation to reliability 


(a) Index of reliability 
(c) K—-R,, and K-R,, 
(e) Flanagan formula 
7. Outline the history and theory of reliability. 
What is Bandwidth-Fidelity dilemma? How does it affect the reliability of the test? 


(d) Rulon formula 
(e) Coefficient alpha 


Suggest measures for calculating reliability of difference score. 
a plan for calculating reliability of composite score. 


a 


10. Outline 





Correlation Methods 
Expectan CY Tables 
Cut-off Score 
Miscellaneous Techniques 
® Factors Influen 
Length of the Tesr 
Range of Ability (or Sample Heterogeneity) 
Ambiguoys Directions 
Socio-cultural Differences 
Addition of Ina PPropniate Items 
° Concept of Cross-Validation 
° Extravalidiry Concerns 
Relation of Validity to Reliability 






MEANING OF VALIDITy 


Validity js another important characteristic of a scientific instrument. The term ‘va 
truth or fidelity, Thus, validity refers to the degree to wh ich a test measures wha 
Measure. Validity is not the self-correlation of the 
Outside independent Criteria, which are regarded b 
ability being measured by the test. 


Different writers have defined validit 


lidity’ Means 
Cit Claims to 
fest, rather it is the Correlation with some 


has said, “The - 
Lindquist (1951, 213) has defined validity of a test as “the - 
ch is i | It approaches infallibj lity in measuring 
€ defined validity as “the 
the quantity it js believed to measure.” 
nfluential Standards for Educational and 
id, “A test is valid to the extent that 
ul.” These definitions point to the 
be compared with some ideal 


araphrasing the definition of validity from the j 





Mwlidily LOS 


ia is known as the Validity Coellic ier independent criteria’ refer to some 
fle 
ort rite 


As he te 
res he proup ef traits lew iHsicke (he: best) (hat the best eel clairns te ree ail 

y eres trait or Ine & 

£ 


joa of the validity is concerned with faneralizatitity. When a test is a gr tag 
easure pad senss; "" n be generalized in relation ta the pereral Population Tes 
jr ae br clusion ca been set and when bath the test ane the Criterion are reliable, es 
15 * criterion has ' aed the criterion can safely he taken as evidence of validity o : 
" prea eaned the meaning of tests and Metsures, the term itself started losing its 
Labbe iat 


: be : | Recearch 
uae 1985, the joint committee of the American Echuc ational Researe 
in Said 


t 
ah vf + el tC 
ene Althonowevel American Psychological Association (APA) and the National es ae gee 
F (AERA), the ria (NCME) publ ishec| a very IMportant berok let tro poye - cat = 
fae at i choi rds for Educational! arid Psychological festing which wa ae cached 
jeasurer ated sian ee, by rejecting the humerous possible definitions of validit Misti. sr 
ui z his joint settee but a simple evidence for inferences made ahout a test <0 * palidlity 
£999: oe is nothing »nt-related, criterion-related or construct-related. In this — ae ie 
t val may be Ecpaieies what can be said on the basis of the test scores and ne 
idence = pte : ' 
Mefers to oe ay 1986). Validity has five important Properties: sa ala’ Sil einen 
reall emselves he relative term. A test is not generally valid. It is valid elit caries ability 
sts E idity is a re isti ility will be alid only for measuring s SHC 
tere T validity i test of statistical ability wi v 3 sents, Whi 
1. mple, a te suring that ability. It will be worthless for other use 
For exa only for measuring that a Ae ae — tation that 
shy ‘ee is tc. It is obvious from this interpreta 
it is pu : of geography, history, etc. >» which the test 
aus. ig the knowledlie © nol a measuring instrument, rather some uses to which t 
ing. One Vv " 


| 
tha 


, ae : ; 5, rather 
i the test because validation is not a fixed process, 
is pul idity ismota tpeee send ole Ew concepts, and the formulation of new meanings 
a rocess MYA Ie oa ee mAh ful The efore they need to be modified radically 
ding pr e test become less meaningful. Ther ? ieee ceed 
an unen nts of the test b he validity of a test computed in the beginning : 
Id conte gs. Hence, the validity o ae — 

the ght of ne aa eerie the test constructor should compute a fresh validity of the 
Leelee dable an ipoeditactesd 

; depence | eanings attached, ry. A test 
he light of ee hikes reliability, is a matter of degree and not an — haaat se wack 
3, Walidity, on partlipular trait or ability cannot be said to be either pe + 

measur! 

meant for isions done in 1999 of the 

id at all. : ak ) cept. In the two most IMpPoOrtant revisions 
wie Validity is a i Mi el Testing by the American ay iors mira 

f Educational! ; hological Association (APA) and the National c 

standard 0 AERA), the American Psycho Ogic ifferent types of validity has been 
association (Al et garages CME), the view that there are differe ypes se of 
i pement in cpu es been viewed as a unitary concept based on various kinds 
discarded. Instead, validity 





, 1s, it requires an 
i 5. 7 ive | t. In other words, it req 
evidence ? verall evaluative judgement. 
= di involves an o 
5. Validity 


ae 5 justified by 
ri I etation and uses of test results are | 
ree to which the interpreta : ns and uses. 
enue fee well as in terms of the pesesnens ot — Sapien se cee 
‘ . , - 1 
supporting ev fora jonal and Psychological Testing have provi etre s are (a) 
‘ducational an y' : t ces of evidences are {: 
The Standards tor idi ific use or interpretation. These sour : ‘ 
ing validity of a specific ions to other variables and (e) the 
for evaluating the va 2s (c) internal structure (d) relations to othe 
b) response processes (c) inte tr ideration of the content 
test content (b) respon | hat the validity may include a consi o 
f testing. It means tha | relationship of the individual items to 
consequences © ': he respondents, the relations lip o 
fed, the ways of responding by the r a: = Te s the consequences o 
‘acum velanisnehip of the performance to other measures as well as the q 
the test scores, re 
using the assessment. 


ASPECTS OF VALIDITY Srvinh wes be Geba, Bawink 
Since a test is valid to the extent that it serves the purpose for i t there are different aspects of 
there are many purposes of testing, it automatically follows tha shreg tain piroeses oF wei 
validity representing each purpose of the test. Ordinarily, there are | 


here | | 
hie D i are h Methog. 
Ww a Nation f es a; havior, 
'epresent p a "Al Sciences 
: rms 
a x r : 
Y determing a Ple, thr pres es O! Content: The tester 
2 esent | gli OF situati May wish 
: ey Blish spel: Ons (or conten) 9 dot, 
future. Amen el of En li ee test (a kind of Psi that the tac tet 
| 5 uner ne veme ithe 
Mechani | Min edict an ¢ in 'P With a variable available " 
mM i. a 
erforma Ptltude t €sent standing He S future standing on a certain ¢ i ENt 6, . 
©Motiona a jo Y Wish to mena; Na particular Variable. For ar trl, 
infer hi d Stmen 6 7 anic, Like ea re mechanical aptitude and predict y On 
Yt0-day adjustin aminee through ie tester May wish to determine th F Tture 
deter “asuremeny Ent with his peers 4Ppropriate measure so that he may b lave of 
IN€ the " OFg . ve abl 
extent + ypothet “to 
. tormanc For my tO Which an me eis Or quality (or construct): A tester ma 
Ome abstr tiie Mp e, a tester may ie POssesses some trait(s) as measured by “ISH tg 
rectly ures like extro to know whether o i 


rnot an examinee SCOres © leg 


ence, which cannot be oth 
eq 


Version, neuroticism, intellig 
© basic to any testi Ng programme. Corres 
=e ted he validity. The typology is based 
‘ 8d by the ame Ytanaards for Educational and Psychologica] 
this typology js hi iii Psychological Association (1966). One of the me : 
Separately, have Bisby tel vl lypes of validity which generally create confusion if t oe 
Curricular validity, Gi . “ced to three main types. The three types of validity are: (i).c0 a 
inane. ae validity, and (iii) construct validity, monet 
: 4 word of caution | 
Educational ais Seen Caution has been sounded by the recent version of 
earsien ceo nee logical Tests. The consensus document of 
ar of validity into subcateg 
validity. It further opines that alth 
convenient, the use of the catego 


Sometimes psychologists show overenthusiasm for 


ponding to each 7 


Manua ii Bub! sh Upon the Categorie, 


Standards for 
hec sd the Standards cautions against 
ories like predictive validity, criterion validity and content 


ough categories for grouping different types of validity are 
ries does not mean that there are distinct forms of validity 
: 7 making distinctions among Categories when 
indeed the categories tend to overlap (Anastasi 1995; Messick 1998). The 1999 edition of 
Standards no longer recognizes the different categories of validity, rather it recognizes differen, 
categories of evidence for validity. 


Content or Curricular Validity 


Content validity is also designated by other terms such as intrinsic validity, relevance, circular 
validity and representativeness. Content validity is a nonstatistical type of validity that is usually 
associated with achievement tests. When a test is constructed so that its content of term measures 
what the whole test claims to measure, the test is said to have content or curricular validity. Thus 
content validity is concerned with the relevance of the contents of the items, individually and asa 
whole. Each individual item or content of the test should correctly and adequately sample or 
measure the trait or the variable in question and the test, as a whole, should contain only the 
representative items of the variable to be measured by the test. Anastasi (1968, 100) has said that 
content validity, “involves essentially the systematic examination of the test content to determine 
whether it covers a representative sample of the behaviour domain to be measured.” Content 
validity is needed in the tests which are contructed to measure how well the examinee has 
mastered the specific skills or a certain course of study. 


In fact, content validity is the degree to which a test measures an intended content area. 
Psychometricians are of the view that content validity requires both item validity and sampling 
validity. Item validity is basically concerned with whether the test items represent measurement 


a 


H Valte 
content area, and sarnpling Validity ig concerned w; dity 107 
pe eer otal content area. For example, a test designed with th 
in tog the ood item validity because all the items indeed 
or sampling validity, that is, all : 


its items may ¢ 

| De ne 3 y dea 

ight ey content va cement e preptate CONKENt area. This be 
} ssib Pd acn an ; ' 15 

ith 6 we cant pos eee can Pia SSPect of a certain conte 
eco? wants tom Performances in the whale conte nt - Yet the 
Nferences Gib nient area hased upon 
items, = Correctly done only when 
icepts relevant to c lid: 
two new concep! ontent validity have 
gecenty dards for Educational and Psychological wage been IMtroduced in the latest 


A, . 
yersion oO ts are? construct under-representation and construct APN & NCME 1999). These 
ce 


F Extent to whi 
O Measure kn hich the test 


deal with Owledge of biology 
l only wi — lology facts but 

y wit vertebrates. Thus a wee 
i yestioa an ce of items included in the test. Such j 


ine pe ad equately sample the domain of possible 


wo CO ecentation Means le to incorporate important cane soe Fa 

ca test of souemat gone included only ancient history but not the m odern 7 

alidity of the eaten na racial we SUUCt under-representation. Construct. 

ihe Vs variance is sald to OCctr When Scores are influenced by factors irrelevant to construct 

ele ale 3 test of intelligence might be influenced by test anxi 
x 


ety, reading comprehension or 


ole: 


content validity of a test is examined in two ways: (i) by the expert's judgement, and (ii) by 
‘cal analysis. 2UPPOS® the investigator wants to examine the content validity of a test on 
tory. For this purpose, the contents (or items) of the test will be submitted to a group of 


matter experts. These experts will judge whether or not the items represent all the 
ct- + events of Indian history, whether or not some a 
ran 


dditional items should be added for 
plete coverage, What should the relative weights of the items of a particular event be, etc. The 
com 


otk ofthe contents or items will be dependent upon a consensus judgement of the majority of 
yall spiectatt atter experts. Statistical methods may also be applied to ensure that all items 
the su) the same thing, that is, a statistical test of internal consistency may provide evidence for 
pone validity. Another statistical technique for ensuring content validity may be to 
the con the scores on the two independent tests, both of which are said to measure the same 
correlate nose one wants to know the content validity of a Hindi spelling test. Then the teacher 
thing: ee the scores on the said test with another similar Hindi spelling test. A high 
can coefficient would provide an index for the content validity. Although a high 
cone coefficient can easily be demonstrated in two sets of scores obtained from two similar 
aE 5 not fully guarantee content validity because high correlation may be due to the fact 
ne rests measure the same incorrect things. Data relating to the item-discriminating 
that sr may also provide circumstantial evidence for the content validity. Items showing such 
sane ah ‘s items discriminating among superior and inferior examinees are said to have 
Svan " alidity. The following points should be fully covered for ensuring full content validation 
of a test: 7 _ —— 
1. The area of content (or items) should be specified exp icitly $0 that all major a “ 
equal proportion be adequately covered by the items. ano 5 saan such 
followed rigidly for removing the general tendency of the item writers to | | 
‘tems which are readily available and are easily written. Si ed a 
2. Before the item writing starts, the content area should be fully shen ne ence ae 
must include the objectives, the factual knowledge and the application o' P 
‘ust the subj er. Fang 
i rahi catia tems should be established in the light of the examinee's 
3. The relevance of contents or items >” light of apparent relevance ol the contents 
responses to those contents and not in the lignt o Jai relevant for a specific skill or 
themselves. This is because the contents may appear lOve 


a certain course of study but may not be equally re 
may misunderstand and pive Inappro 
dependent not upon the apparent rejey.s 
of the responses given by the examir 
Content validity is most Appropriate 


[9 the Px; 
4 os a 
priate FeSPON ses, Th 


ci | 
* i 7 - + Us, : 2 
ms nce of the lest terns, rather ; “ni vay" 7" 
'€€S lowards those test item, “Pon the pei ity 
) ‘ : 
ly applied lo the ac ‘ 


Mir eee. 


Is 
; hiey eae: a 
For the aptitude test, the intelligence test and the ns ‘ ement test ory Sirens: ies 
and sometimes may be a Misleading index f P ie ity lest, Content Valid v Oficien, 
ae ae mai "eX Decause | ‘ontente -: rT -¥ 
resemblance or similarity to the trait or behaviour rudy ontents of these tests h, 5 Noy sent 
achievement tests. : Ar they are attempting tO same 7 ie ints al 
a Wir 
Face validity is ofter confiye . de 
Sage : L JSeq with ac ! vm [ie z * 4 th, 
Face validity refers not to what the test pon mers but inthe SIrICt sence itis ou; f 
measure superficially. {p aa "actually claims to measure but to ites gi 


i ofl a 7 "Be ‘ wha j Mer. 
validity (Kaplan & Sacy ) They ce validity is the mere appearance thar |)’PPean” 


nor should it be regarded ac » . ae validity should not be taken jn the t the lee s 
looks valid to thewic,, a nn a | = for objectively determined Valiclity. Wie nical as 
is said to have fac cali, ae because it Provides a logical link With the 0 

A : 5 * ui + fo = bid ', ae : _ 
technical form of valid; ey Is, IN fact, a matter of social aCCeptabifi : he tes) 
a : a ali i i ‘i _ = ; | 

Obviously, then the Plirtinee of ca. vat nity, criterion related validity and Conan Not 4 
be ’ 3 Pose of face validity is to establish fappont and <a,  Valig; 

©ause when test tems do ney mir, Oe SACU re. a: ity, 

responding aad ee: NOl appear to be Valid to the examinees, they May not ne r 
Palaver ‘ ae, lay oe ie answers because such items themselves . “Operate in 

| , €eps the examine ive 5 Sac at a tos, 
commented. Whe rs aminees motivated. As Sackett et al. (2001) hay. '0 be 


Na test looks 4ppropriate for the perf > Situati "ight 
examineec ..: | : ! Perlormance Situation j ory 
eden” be expected to perform, they tend to react Positively,” TRIG th 
le i ati | 
quality Face ca ieiPretation should not be taken to mean that the face Validity is 4 ,.., 
detenuig: ; va idity 'S needed in al| lypes of tests and helps a lot in iMproving the obj wri, 
€d validity of the test by improving the wording and structure of the test Coritent. 
Criterion-related Validity | 
ri Ls - 3 TT r . . a : = 
ret related validity 'S'a Very Common and Popular type of test validity, As its Name imp}; 
-Merion-related validity is one which is obtained by Comparing (or Correlating) aan 
With scores obtained on 


En BeOr 4 criterion available at present or to 
Criterion js defined as an external ! 


the test claims to measure, In this sense. b idi 
| ; 3 ‘ “= SY Way of defining the validit Of a test 1965 
has said that the validity of a test is . ] ef rita 


scores and the “true” (that is, perfectly reliable) Criterion scores. There are two subtypes oj 
criterion-related validity: (a) Predictive validity, and (b) concurrent validity. A detailed discussion 
of these two subtypes is given below, 


Predictive Validity 


Predictive validity is also designated as empiri 
implies, in predictive validity 


the test scores and the criterion 


index of validity Coefficient. Marshall g Hales (1972, 110) have sa 
coefficient is a Pearson Product- 


3 aP moment Correlation between the sc 
appropriate Criterion, where the criterion 


us take an example to illustrate Predictiy 
postgraduate class in terme Of grade A,B Cp and E—A being 
worst grade. The investigator may administer a test of intellige 


ants to predict success in a 
the best grade and E being the 
nee at the time of their admission to 





Validity LOS 


as and thus obtain a ra of sCores, After twee Years on the basis of classroom 
e cla are graded according to the above Categories. Here, Brade points would 


aster? } students APM correlation through a scatter diapram ma 


y be computed between 

| o 

pe Prema criteri  ecores and the grade Point obtained after tWO years. If the correlation is 
9¢ site amelie” ‘th all certainty that scores on intelligence are directly Predicting the future 
cone Ot ean § wi dents in the postgraduate class. The Correlation becomes the index of the 
ihe we ' | ; 


Likewise, in an industry, the management May wish to select s 
* best performance on the job. For this purpose, they select a test which has high 
Pe dit * xhibit "The y may, for example, administer a test of special ability (ora 

gcan validity: test, for the selection of workmen. The scores on the mec 
Kies | areaek year or two, against their average performance as 
fs ; rr if the correlation is high, the mechanical aptitude t 
rest ey roduct Predictive validity is needed for tests which include lon 


; 5 A + v. . = = j “ [ a 
ae espe Bo forecast of vocational success and forecast of reactio 
je _ 


rol i, ie 
nif man ficient: 


measured in terms 
est is said to have 
B-range forecast of 


N to therapy. 
pe mic ach 
| alidi se a 
rrent = o another subtype of criterion-related validity. Concurrent validity is very 
a a uty validity; except that there is no time Bap in obtaining test scores and criterion 
redict! 


lar tO P correlated with a criterion which is available at the prese 


simi! _ The test is 
cores constructe f intelligence. The resulting coefficient of Correlation will be an indicator of 
gi ized re o if the correlation is too high, it will indicate that the new test is a needless 
sta urrent vali anne revious one. Likewise, an intelligence test may be validated (or correlated) 
duplication sien pera in the previous examination. This will also be an example of 
ygainst the alidity. Concurrent validity is most suitable to tests meant for diagnosis of the present 


rent v 


concur her than for prediction of future outcomes, 
ef Me 


status (20 falidity can be determined by establishing relationship or discrimination. The 
ens ia is simple and it involves determination of the relationship between scores on 
relationship ai el oe established criterion which are concurrently available. In this 
the test pies involved are as tallows: 

age test is administered to a defined Broup of individuals. 

: The criterion or previously established valid test is also administered to the same group 

of individuals. 

(c) Subsequently, the two sets of scores are correlated, 

(d) The resulting coefficient indicates the concurrent validity of the test. If the coefficient is 

high, the test has good concurrent validity. 

The discrimination method of computing Concurrent validity involves determining whether 
the test scores Can be used to discriminate hetween persons who possess a certain characteristic 
and those who don’t Possess such a characteristic, or between those who possess more of a 
certain characteristic and those wha Possess less of that characteristic. For example, a mental 
adjustment inventory would have concurrent validity if scores obtained on it could be used to 


A comparative study of predictive validity and concurrent validity has revealed that for the 
same test, predictive Validity is usually lower than concurrent validity. The reason is that the 
degree of association between the test and the criterion decreases over time. The academic 
aptitude test scores will be more highly related with the present academic achievement of the 
examinees than with the academic achievement to be obtained five years later. Naturally, then 
Predictive validity will be somewhat lower than concurrent validity. lf concurrent validity of atest 
“@PPENS to be zero then its predictive validity is most likely to be zero or close to it. 





Ces 

Whether the test construc 
faced with one vital problem, | 
validity, is the identification and selecti Valid 
i: : ection of an apnoran:: Ore acy. dit 
inappropriate and inadequate criterion may bring degac and adeq a in a 
the validity of the test is adversely affected. For exa i doa itetigg, 
intelligence against a future criter; 


tt 
of Cor . Eig 
Mple, when one j rel n, 


tor Is Computin 


This problem, & Predictive Validity or c 


which is COMparative| Nurre 


industrial situation, a test 
of the unit of production o = rkers is usually validated aoa; in th 
Bihlstits becouse aint ver d specified period of time. Again, we ar Banst the Titer; 

acorn DNS OF production over a specified period of time © laced with «0 
workers’ ability, rather to a varj aie Me May Not be so} Milay 
Rateaiid Resitseneat - ariety of factors like interest, motivation of the w, Solely relat 

; Peed of machines, regular supply of electricity and raw mat.<: Orkers, 

when we are computing the concurrent lic and raw materials, ete L Cquate 
the aiderit it’Should ics 3 nt vali ity of a test, the criterion May not be itself Ke Wig 
attenuated dr rad ( 64In, In such a situation the correlation coefficient Wie delet 
ao “ae ae and thus, the validity of the test would be substantially ‘inte hls 9 be 
se ale ‘tical re ationship between the test and the criterion. Itis, therefore, ; than the try. 

ained validity coefficient should be corrected for attenuation. (en BREStEd that th 
Correction for Attenuation 


In computing Criterion-related validity coefficients, two types of correction for att 

put into order: a full correction which includes correction in both the test as val ithe are 
and one-way correction which includes correction in the criterion only. Sometimes b ‘ae 
(which is being validated) and the criterion (against which the test is being Fats mg 
unreliable due to the faulty methods of test construction. Hence, the correlation iia he 
between such a test and the criterion is automatically lowered. Therefore, a general foerut 


which takes into account the unreliability (or measurement errors or chance errors) 


Recta In test as well 
as in the criterion, is applied: 


r= by 
Go ie 


== \6.1| 
irs Miry } 


where r. = correlation between test x and criterion y corrected for attenuation: 
validity coefficients, that is, obtained correlation between test x and criterion Yy; 
coefficient of test x and, f,, =reliability coefficient of the criterion y. 


ry = obtained 


fre = reliability 


Equation 6.1 provides a full correction for those errors of measurement which tend to lower 
the reliability coefficients of both the test and the criterion. To illustrate, suppose the reliability 
coefficient of a test is 0.87 and that of the criterion is 0.80 and the obtained validity coefficient 


between the test and the criterion is 0.50. Now, the corrected validity coefficient, according to 
Equation 6.1, would be: 


050 _ 050 _ 050 1, 
‘ (087)(080) 0696 0834 
i : : e : “ . icity 
It is obvious from Equation 6.1 that the correction for attenuation increases the validity 


i eo aches . ee jon 
coefficient of the test. If the reliability coefficient of the test as well as criterion is 1.00, Sat 
for attenuation in terms of Equation 6.1 has no meaning. Likewise, when correcte@ valh™ 





< r_) is either 1.00 or ve Close to it os. 
erent (that He's | r diffe : 21 students 
fice” Len, are no longer Cillerent, rather, the | ‘ 


IS shou 
: W 
he denominator in Equation 6.1 sete 1, forms of th 


coe t 
» efor 
crite ote that l 


<a lu 
) ; the criterion. In the above example k fans should 

al = . # ® , | ‘i 
ine tetany = (0696 = 0834, which indicates that in tha” SeOminator jg 
ine 71080 beyond 0.834 Particular examp| aaa 
(08 cannot correlate beyond 0.834, Mple, the test and the 
=fitefio” an happens that the criterion is far less reliable than the test ite 

io 3 of opinion that the correction for attenuation should be Itself Hence the m 


ajority of 
Criterion 
ection for 


a sn ite bor 15: saree: ee 
P per to both. This is known as one-way correction for a 


applied only to the 
30 vation is to be applied only in the criterion, Equation ¢ 
atte” 


ttentuation. When the corr 
lis red Uced to: 


Vy 16.2) 


ig are defined as those for Equation 6.1. To iMustrate, sy a P 
where ere and the reliability coefficient of the c Ppose the validity coetticient 


riterion is 0.67, th | <i 
coeffic! 


0.45 0.45 
r= =——- =0.550=055 


J067 0.818 
ous, thus, that when correction for attenuation js applied to 


itis ODV! he validity coefficient of the test. 


increase t 


Major Qualities Desired in a Criterion Measure 


in computation of val idity coefficient, identifying a Satistactory criterion measure is a very 
_nportant task. But this task is not SO easy. For a satisfactory criterion measure, tests- 
| P reinents research workers have identified four desired qualities: relevance, freedom 
ne reliability and availability. A discussion follows. 

a (i) Relevance: A criterion with which the test is to be correlated must be relevant one. A 
relevant criterion very closely corresponds to the behaviours of the ultimate interest. We judge 
anv criterion to be relevant to the extent th al the standing on the criterion measure corresponds to 
he status on the trait one is going to predict. In general, relevance corresponds to the content 
validity of the criterion measure. For example, fora test of abstract reason ing, a criterion which 
closely assesses only abstract reasoning, will be considered as a relevant criterion. | 

(ii) Freedom from Bias: Another important quality desired in the criterion is the ireedom 
from bias, which means that the measure should be such that each person has almost the same 
opportunity to score high or low, or should be such as that each person with equal capability 
obtains the same score irrespective of the group to which he or she belongs. A criterion measure 
that does not allow or restricts such opportunity or produces variations in scores amnonig sere 
having equal capability, is said to be a biased one. Any such criterion that conan aa 
bias cannot reveal relevant differences among people on the trait of interest wile pa 
the extent that criterion score depends on the personal characteristics of aahenere = alalib 
in the conditions of work rather than on the trait of interest, there is no meaning of tne co 

between test scores and the scores on the criterion measure. ee 

ili) Reliability: Reliability of the criterion scores !s aise ste one situation to 
the criterion scores is meant that the criterion score must not ne mi bie Saute 
another, rather there must be stability or reproducibility in — or from day to day so that 
it jumps around in an unpredictable way from one situation ye low score on the other day or in 

@ person who gets high score on one day or in a mark eae and yield a high correlation 

other Situation, there is no possibility that it will predict the | 

With the test. 


the criterion only, it tends to 


and- 
from 








Ear), My 
OonVven tl ie h Clences 
Y av laly er Sire 
SCOres ; le | } ~~ Quali 
tin terion INE reg "Vy of the e; 
A Measuna . “Carcher wa. © Criterion measure js 4) 
Gr Shou ld i ure Should O15 deter ; 4 . bid I WHale 
eres ee Such Mining the validity COefficieny 7 4, 
Ore, any Choice of al One, | Cithey al one js required to wait tor lonpey aii. 1 tpg 
: Nerion Case, the criterion will not be po: OF Tipp, 
Construct Validi MUSt also take th saistly aVailaby 


5 Practical limit into view, 
Con " wei 
: Struct Validity BS thew, 

SOClation ands 


= Construct Validation IS 

me &M0N-related ‘alidat ion. 
en he is f ad th, 
en he is tully Satisfied th 

UNI Verse of : 


) content ent; 
Other Words 


Criterion-relat 


Mporta nt [ype 
ech Nica | 


| Of validity. The term “construct validity» 
€ then it hac en frequ 


: : F A ae i; 
fcommendations of the American px ‘ ok, rs 
ently used by measurement theorists, Bical 


mM ‘ 
tae ©OMplex and difficult process than content Validat 
C€ an j 


iNvestigator decides to compute Construct Validity oa 
neither any valid and reliable criterion is available to him aon 
and adequate to define the quality of the test js Preseny | 
Is Computed only when the scope for iNVestigatin, 
ntent validity is bleak. Thus the process of CONsIrUct valid; 
“ r UNiVerse of content is accepted as entirely adequate to define the 
aces a Cronbach & Meehl 1955, 282). - 

ruct validity has also been given other names such as factorial validity 


validity. In construct validi : 
: out alidj . the mean 4+ ic gp ined in terms of a COMStruct, 
(1968, 114) haste, he © meaning of the test is examined i 


“the extent to which the test may be said to measure a th 
construct Or trait.” What is 4 Construct? A construct is a nonobservable trait, such as intelligence 
which explains our behaviour. According to Nunnally (1970), a construct indicates a hypothesi« 
which tells us that “a variety of behaviours will correlate with one another in studies of individual 
differences and/or will be similarly affected by experimental treatments.” A few examples oj 
construct are anxiety, intel ligence, verbal fluency, extroversion, neuroticism, dominance. etc, 
The process of validation involves the following steps: 

|. Specifying the possible different measures of the construct 


ll. Determining the extent of correlation between all or some of those measures of construct 


Inc 


ed Validity : 
4 | Or co 
required when, "no Criterion o 


quality to be Measured 


ANd trait 
Anastas) 
Coreticg| 


Il, Determining whether or not all or some measures act as if they were measuring the 
construct 


A brief discussion of each of these steps for determining construct validity follows: 
|. Specifying the possible different measures of the construct This Is thet rst step in any 
construct validational study. Here the investigator explicitly defines the construct in clear words 
and also states one Or many supposed measures of that construct. There is den standard way of 
stating the different measures of the construct. Specification of such measures is partly dependent 
ipon the previous researches conducted in that area and partly upon the intution of the 
ae tor. Suppose one wants to specify the different measures of the construct intelligence’. 
ne ene would have, first, to define the term ‘intelligence’ and in the light of the 
pups hei he would be expected to specify the different measures. He may define intelligence as 
—, ability to think in a rational way, to act with a purpose and to make adjustment 
an pe to the varying demands of the environment. From this definition, a number of 
specifications may be made. 
1. Quick decisions even in complex and difficult tasks 
. Ability to operate with symbols, figures, ete. 
. Ability to learn 


Purposetul behaviour, or goal-related behaviour 
Comprehension of verbal materials 


Vk! 





Validity yyy 
mocily hehaviour 
ty tw 


yl 
calf ney 
(i. } | jiu! Fie. > 
orth al thinkin 

, Wer ak and critic il t KB 

= prigghte lo numerit al calculations rapidly 
fi ' jay SEO Se 
j Ability jist of measures Vs not an exhaustive ore Crher mesa wires May be inclucer Core 

. 1% ; : tein - ‘ Cliche! ‘ 

al ve re that if different INVES Palors Outline the number cf Measures of the « Ua 
I gement not he in complete agreement dling Therrysebyen rev 

a 


apet? p, the is demerit applies only to Thase constriye ts which ha 
Baste gut this ¢ 
e 


acts have not been unearthed, 
eEnX. be 


the extent of correlation between a/| oF some 


arcing the {Pec itic ation 


ve not beer well deiined 
5. 
med? iy ortant 
wher" mining h he Of the measures of construct: 
I . hc F ; 
an en measures of the construct have heen outlined. the «ar ond step consists of 
, qua 


Pi ther or not those well-specified measures actually lead to 
yhen ©, Tie ct. This is done through an empirical Investigation 


« measures ‘ZO together’, or correlate with each other, is determined. if the 
ve s highly correlate with each other or if the various measures are affected by a 
wh measure ntal manipulations (called independent variables) in much the same way. we 
yarle y of expert™m nce for concluding that they are measuring the same thing (that is, the came 
icient Oe een piiesl investigation as above (where the correlation coefficients are 
ronstuch Twas the different measures of a construct), we may have to face any of the 
cor” ned DS 
rowing nia easures may correlate highly with some measures, but correlate very little with 
(a) Some Mes ures present there. In other words, some measures tend to form a cluster and 
other meast of a cluster correlate highly with each other and poorly with the members 
alt peat For example, out of the above nine measures the three measures, that 
of other ¢ aa io tend to form a cluster because these three are expected to correlate 
IS, 2, oa, each other but not to the same extent with other measures. Likewise, 
eee 1, 3, 4.and 6 may form one cluster and measures 5, 7 and 8 may tend to form 
meds : 


another cluster. . | | : 

+ may also be that all measures of a construct correlate highly with each other. This is an 
” aanl <itaollon for investigating construct validity because in such a situation, the 

| estipaitbi can safely conclude that all the proposed measures directly correlate with 

investiga | 

the construct. | — 

The possibility that all the proposed measures correlate very litle or all correlations are 
a a, ois may also exist. If such is the result, the obvious conclusion is that the measures 

ne pea: at 

are all independent and each of them measures a separate ability or trait. 


Ihe measurement of 


in-which the extent to 
dete™ cerned 


the he yariou 


In practice, however, sucha me is rarely reached and the investigator is often thrown 
into a dispute before a tinal decision is taken, - - 
- _ ‘Gaaianing whether or not all or some measures act as if they a per = 
construct: When it has been determined that all or some measures . — ee . 
correlate highly with each other (providing sufficient evidence for t : re — oe uae 
measure the same thing), the next step is to determine whether or not suc nme ee a) 
reference to other variables of interest in an expected manner, lt — “ate ° - Hf iy 
manner, it means they are providing evidence for the construct | * oe ae aed 
correlating measures from among the above nine supposed reverenits or in : ig a eae 
at least some moderate correlation with teachers’ ratings, grades in class = ae eam 

It is obvious from the above interpretation that unlike content vanaly bake Sh SS aaa 
validity, the evidence for construct validity is always circumstantial ‘ er ioe i. aaa 
Validation is also a difficult process because it contains several pro 





f° 





Wb y 134 
[ al W. il Lida - i. , 
esa se aA phe ITS) Fae OIE 4 werd Oe. 
jptesaoar ue { il & mranvane valuiay, The vg wv ttt tace SE ae Onan ba ative 
, iti itability atid inane. ! Wad vn haw ae, a an ry 
wwe ihe conaruc, en = ae een y Cie, eag8 nan wnehies, te we h thin reps, saaeees, - Weninletiinmaae, 
‘a "7 rrawsres, Cte, Phila Hih,. ‘A Pe 1 5 we SS ite ae hae, f fire baw a, 
Tatas , eli nition Aalionis among, | 1 ie I. fran iat tae wet, te | Nae Oiabi Mae, 
erning the ol high conte | onily if there 16 1G suitable ai, Ma wo atm od Way 19 sad merpeensie Auth, eile Le Te een 
gion COT ssc, tat pst CONSEUELOL ONY Vil iy, - eet ch Caen A Wee Wats 77 cach 0h the on ARE EF, 
jt OT the can ; yy thet idity, However, the eyide (Pig . arte WE arr gy, ey res, Vie sper of 
" neal iginy eH iyating comarucl vali a C Pee — 1H 4 ye ig, commnuct validmty is that the yrge Ning hope dhs eee eae 
me j wall wd for eva arch, GaitlQory (ZIG) hase * ey ‘ ‘pk ey -hitiasint aes Wav, (fh CORIO 
sgt he enethiod 10 ararne Of Fesea at Os bry, An : AA Thine Tigthy With cach tfeee Goa. 3 
are yo 10 eonds pan 4 saw dicate that a new test has constr 4), eg wh af git should con Ti iollonte hyper ol Wey Se wiles secnes 5 fires 
jdity always OT cadinign thal would Ini nee “ha, Wh gene OP me nets. The Walraring bypcties 
ype psi a tu: kinds a Fe wand therefore, MeaMastes 4 SINBJE Constr urs ; 7 
: Ww (Fh. ' «| MOpenent : wits 
enntll ce bw ¥ 


| rare te 

ay The feel acne “ 
ale 

he tes CONGHA M 

7 oo narrcent varie 


Ma trite her, 


| . “ Pe on he oon we ef 
: a a oy a 1999) serve ay hustrationn,, 7 Umer 
ly with related textsfinvtrurnents/variables than with Weel LE iske 
highly wi : 


te & s,1, thee rraits (A, 8, ane C) nave heen mresyyed ny 


—, shane Y Cate retreats (1, 2 aed %) Tee 

i TA ay be aggression (A), newesicinm (8), and emeticnal ackitey Cand te ton 

over time or across periods of the diflesem ages are ON itery ie ya e personality wnvertiory 'V), projectsve Mo Od), 8 he engens tates, 3, 

, t changes 3 aod | may. "p BEZIEMM ALAS OM Dery makey ierjecery les, sence: 

fii) Developme of construct being assevwee. i Be pe indicates sa dee bes ag en , inicaes By aee 

wih the See the well-defined groups on the test are theory-consistent, on J uctive test: As | on — ne we ergs’ tatangs, Lkewicg, C. 

ae . - eRe rinks : eee ao se Le | ore iin aay wre Y ant oc - a = 

i coegenen — roduce changes on the test scores that are theary-consistens KP on of emotional sabilty on persmalay WWetony od C, indiczes yixen, of 
cig aane wits the test scores produces results that are understandable jp thine | 

: aly ie nee 
(yi) The factor ana? 


7 r _ - 
ge sal gability oF the experts’ ratings. 
| ed j on 
ni the theory by which the test was constructed, | 





le 6.1 A Hypothetical | Multitrait-Multimetned Matrix 
‘on, it can be said that construct-related validity evidence, jn realy = | 
As an overall conclusion, salidiy evidences. For example, in construct Validating, Mees | 7 Mees Wetter 3 | 
subsumes all oftes ee essential step. Likewise, for providing construct-related eyiden, _ qrats B. B C4 2 &% 2 i. “ 
coment-related panera against many criteria. For example, a test of health status Might hye. aii 
for validity requires i ae inn with symptoms, doctor visits or other physiologies Be j 
~~. 2 “tele until very recently, tried to divide validity into different types. But ti. 559 (0.90) 
variables, Many le oe : divide validity into categories like construct- and criterion-telatad or BY “ 
reality 1s that any atternpt to because there is much similarity between construct validity and aa 038 0.37 (086) 
vality ' oY ae oe aa psychologists believe that construct-related evidence jr, - 
validity isthe only real type of validity with which we should show concern because this validity bo 088 0 Ome (0.56) 
subsumes the other so-called type of validity. According to the famous testing pioneer, Lee Pe Se ——— 
Cronbach, it is not appropriate to divide validity into three parts. Cronbach (1989, 91) has rightly eos? 25023 ~- 08 i ot 1 | “I 
maintained,” All validation is one and in a sense, all is construct validation." The 1999 edition of a 053\ 05 
Standards for Educational and Psychological Testing also rejects the different categories of C2012 ___ 9 W2 = lS 
validity and instead recognizes different categories of evidence for validity, 


CONVERGENT VALIDATION AND DISCRIMINANT VALIDATION 


080TH OH 
~ io a . \ “S _ 
Campbell & Fiske (1959) have demonstrated that the convergent validation and the discriminant 


I Se Bar a " a 
‘ | ey _ + We = ay 
. (O46 >. 9.67 ~s_ O34 ™ 
Method 3 82923 ~~_059>~_ 0.12! aa i oh 

mt 10 : | 3 “ % . - oe 7 as 2.80 os on 2 
validation are important for establishing satisiactory construct validity. For a satisfactory ~ 1042 or ~~0ss (84 ___ 02 ~. D 
construct validity, it is essential that the test correlates with other theoretical measures with which ill iaad 
it should correlate and it is equally essential that it must nat © 


orrelate with other measures with 


which it should not. For example, a numerical aptitude test should correlate with an arithmetical 


reasoning test and algebra test but it should not correlate wi 


in Table 6.1, all correlation values are hypothetical and include the as — —— 
Pea th a history or geography test. When diagonals (or monotrait-monomethod values) given in parentheses and the — ‘si 
the test correlates with its expected referents, the process is known as convergent validity and diagonals (or monotrait-heteromethod values) given in italics. The correlation vente = 
when a test correlates poorly (or not at all) with measures with which it should not, because it de Blisters aie ured by the same method have been shown by sofid-line triangies (that s 
differs from those referents or measures, the procedure is called discriminant validation. \n this | ager 
sense, the low correlation sometimes becomes the evidence for i 
In this way, 


net it- jangles) and the correlation values between 
; the validity of the tect. by heterotrait-monomethod triangles) 
| It. can be said that when a Measure correlates 
assess the samec 


ine triangles (that is, by heterotrait- 
Pe, , measured by difierent methods have been shown by broken-line iangies ithat i ware tia 
onstruct, con lidity is obtai Sree et tes believed io | od triangles). The cells of the matrix that contain value of correlations in brackets 

obtained in one of two ways Fk een, di _— The evidence for convergent validity i called as fclidalis diagonals and all the cells having correlations between the two broker 

| : oe ee monstrate that a test measures the s thi ; other ey | Pn ea a s and the nearly heterotrar- 

tests used for the same purposes Second, we demo ve a traing| idity diagonals. The reliability diagonals ‘a 

xpect that th = SECON, monstrate a specific relationship that we can g'€s are called as validity diagonals. pa are eS A 

tdrold have cuit nay For demonstrating tiniest validity, the monomethod triangles constitute monomethod block and the validity diagona 

test does not measure. ons with measures of unrelated constructs or evidence for what the heterotra 


it-heteromethod triangles, one on each side, constitute —— i gs 
“aretul examination of Table 6.1 reveals the following four points | 





WG Tess Measurements and Re 


establishing convergent validity and the discriminant validity (thus, Establishing the 
validity):  EOh gy 
1. The values in the y 


would be the ey idence 


alidity diagonal should be larger . 
lor the convergent validity. Nes, > 
2. The values in 

heteromethod tri 


angles (shown by broken-line triangles). In other words, the 
should be 


Cler 
Validity Mai 
' i = c Nt 
Higher than the correlation between traits measured by the differen Methog iy 
would be the evidence for discriminant validity and the data of Table 6.1 fully tes Tis 
PAQuirement, 


this 
3. The values in the validity diagonals should be higher than the Values ; hete 
monomethod triangles shown by solid lines. In other words, the validit 


cage TT ' 
¥ Coefficient . aii 
higher than the correlation between different traits measured by the same method, Ifthe Opp, be 
is the Case, it means that the examinee’s scores on the concerned method (may be the jn € ita 


or the projective technique or the experts’ ratings) aS atected Oy seme irrelevant Comm, Y 
actors lying within the examinee such as his social desirability (or undesirability) tendency : 
ability to comprehend the language of the items, etc. Data of Table 6.1 Partially meg the aby, 
requirement. Fulfilment of this requirement provides evidence for discriminant Validity, : 

4. Both the types of heterotrait triangles, that is, heterotrait-monomethod Irlangles ang 
heterotrait-heteromethod triangles should show within themselves some marked Patte 
intercorrelation. This would 


; = : s js of lrait 
also provide evidence for discriminant validity, Data of Table : 
meet this requirement toa greater extent. 


it the validity coefficients meet the above requirements, they are said to ha 
validity and discriminant validity, which ultimately guarantee the estab! 
validity. For a satisfactory construc 
points must be met, 


ve “ONVE pen: 
ishment of Constr 
t validity, the requirements shown by at least the first three 


The Unified View of Validity 


As discussed earlier, a new unified view of validity gradually emerged 
this unified view, test validity is something of a misnomer because 
interpretation as well as the uses of test scores for a 
‘Cronbach 1988; Messick 1989; Shepard 1993). Validity requires an evaluation of the extent to 
which interpretations and uses of test resu Its are justified by supporting evidence and in terms of 
the consequences of those interpretation and uses, Therefore, it involves an overall evaluative 
judgement (Linn & Miller 2011). 


in the 1980s, According to 


what is validated are the 
particular applied purpose, not the lest itself 


lf the test constructor constructs a test and ne 
administers the test to a froup of examinees and obtai 


again validation is not an issue. But suppose he take 
one may ask about 


ver uses it, validation is not an Issue, If he 
ns scores but locks those scores in an almirah, 
§ those test scores and uses them for d purpose, 


or infer about the appropriateness of the test scores for that purpose or use. 
inferences related to the need of validation are 


of two types: interpretative inference and 
action inference, By interpretative inference is meant what the test scores mean, that is, how do 
we interpret them. By action inference is meant the claim regarding the appropriateness and 
utility of test scores forming the basis of some specific action. 
In view of what has been said above, test validation 
gathering evidence in support of those score-base 
inferences aims at bringing together several evidence 
scores as well as de 


monstrating that other 
action inferences requires evide 
and usefulness for Particular 


develop a stron 


's understood as simply the process of 
d inferences, Validation o| interpretative 
$ IN support of certain interpretation of test 
interpretations are not much plausible. Validation of 
nces regarding the meaning of test scores, their appropriateness 


| “applied purposes, Thus, the goal of test validation is to build and 
6 Case for the inferences the researcher would like 


to make, 


og Mie 
and Statistically Significan 1 


: Mi 
the validity diagonals should be higher than Values 


‘search Methods in Bebavioural Sciences 


Valicity 1i7 
1989) has expanded the boundaries Of validity beyond Meaning of test scores to 
messic® implications Of score interpretation and utility. He strongly emphasized that 
ude Jue go beyond evidence related to the meaning of test scores so that the 
u 


ss just ‘ toti : : 

ine gation = of score interpretation and test uses (both actual and potential) must also be 
ya eu nce ” popul arly known as Messick’s expanded theory of validity, According to this 
cre 4. This eb of validity is expanded beyond Issues of test score meaning to include 
aie the ne levance and utility of test scores as wel] 3s the Consequences of test interpretation 
(he ces 0 

yl 

nd se 


AL METHODS FOR CALCULATING VALIDITY 
sf jst IC ral statistical methods that are employed in computing the validity coefficient of a 
"are seve ones are enumerated below, 
there * important one 
le 
at 


est: er 
correlation Metho 


ay of a test is defined as its correlation with some outsi 
the validity © ficient can be calculated by different methods, the important ones being Pearson 

s prrelation a biserial r, tetrachoric r, phi coefficient and the multiple correlation. Of these, the 

P piserial h PO” ris very common. The biserial rand the point biserial r are used where one of the 
<e of pearson correlated is divided into “Pass-Fail”, “True-False”, “Above median-Below 

variables pig the other is given in terms of continuous scores, When, on the other hand, 
median”: on as well as on the criterion are divided into two categories, the tetrachoric ror 
scores on ” _ are the most appropriate statistics, Multiple correlation is used where more 
the phi Co® Z _ are involved. R is a symbol for multiple correlation which indicates the 
than ip wena one measure and the composite of two or more than two sets of measures. 

relations | 


de independent criterion. The 


2 peeieti — is one way of showing the relation between the test scores and the criterion 
The expectancy table eaney table, the expectancy (usually expressed in terms of percentage or 
measures. In an cult measures for each examinee is given against each test score, Thus a 
proportion) of sate ibution is prepared in which test scores are plotted against the criterion 
sort of bivariate dis cen this bivariate distribution, the correlation coefficient (that is, validity 
scores OF ie iinet ‘is its name implies, through the expectancy tables the predictive 
coefficient) is oe : siiemed Estimates are usually based upon the probability that an 
ama ; si a articular score on the test will obtain a specified score or rating in i 
Tae St roles Expectancy tables, therefore, answer ise eet oe . 
performance | ae ill secure the firstrank in the annual ex 

probability “ve aan ie octane the 90th percentile in the clerical sien 
eee ein aie achieve a rating of “excellent” when posted > nore le 
srobabilit that a person securing a score of 60 (out of : af the diss Bapeentctables, 
entrance test would be rated as “excellent” after the completion : ” sess ree =k anata 
intended to answer all such questions, wontld Ee pela cotitencinn However, al 
would depend upon the particular probabilities i . ae aiibe-prokabiliieso 
expectancy tables would have one thing caters del score is known. To illustrate th 
the score or rating on performance is predicted when the has been provided in Table 6.2. 
use Ol expectancy tables, an example with hypothetical data has bee 


7 965) or McNen 
ised to consul Guillord (1 
| | I il tcorrelational methods, students are advised to 
For a detailed discussion of the different correlation 
11962). 


118 fives, Measurement and Research Methods in Bebrary Mra Stereos 


Table 6.2 Expectancy table showing relationship between Medical] 
criterion grade of 200 students of Patna Medical College 1 











120-139 

100-119 
80-99 
60-79 
40-59 
20-39 

0-19 











0 studen = 
between 120-139 in the Medical Entrance Test (MET) , 80% were rated = "exes aca 
“good”, Likewise, of the 20 students SECUrINg scores in between 1604 3 4 
, 12% were rated as “excellent”, 20% were rated as “good” and 10% We I29 € 
“average”. Similarly, on the lower extreme, there were 12 students out of whom 60% wee 
as “worse”, 10% were rated as “poor” and 5% were rated as average, o WETE rated 


Valid, and Partly because the actual ability Or potential of the examinees may not be revealed jn 
single test. As a consequence, some inferior examinees may be selected whereas SOME superior 
examinees may be rejected, 


Miscellaneous Techniques 

Besides the above techniques for establishing the validity of the test, there are some additional 
Procedures which are infrequently utilised. The comparison of test scores of the contrasted 
groups (such as normals and neurotics, institutionalized mental defectives and normal 
school-going children) js one way of establishing validity. Likewise, the Percentage of the 
students successful in a Particular age group may be grouped with the similar percentage in 
another age group for establishing the validity of the test, 


FACTORS INFLUENCING VALIDITY 
Validity of a test jc 
enumerated below. 


factors. Some of the important factors are 


influenced by several 


Length of the Test 

Homogeneous lengthening of the test not only increases the reliability but also the validity of the 
test. The longer the test, the more reliable and valid it becomes. Thus, lengthening the test or 
repeated administration of the same test increases the reliability, and since validity in a 
homogeneous test is dependent upon reliability, lengthening also increases the validity of the 





Validity WAi9 


ompared to reliability does nat « hange rapidly with inc rease in the length of 
as & 





vali ity , peated administrations of the same fest. The ine reased validity of the test after 
it ‘ waray rep mz 5 j m fe ‘ i 
(eo gt or aaa inpthening is given by the following formula: 
baal ag - | 
Le- ul (rijlr 
ihe a yene? lectins) = . (6.3) 
yn+nn—-1),, 


pam 


lation between criterion and the test lengthened n times; r_ = correlation 

= corre is and the test in original length; = number of tires the test has been 
, > 
os criteric 





ae ins! 2 4 i: re ? “ee 2 
where the epeated on the same sample; ave hi ventability coeff iciern othe test. 
pew ened or FER™ ppose a test has a validity coefficient 0.50, a reliability 0.40 and it is 
jens" ‘flustrate, bee its present length. Then its validity, according to Equation 6.3, would be: 

bs ,ed four um ___ (440.50) 2 
len’ ew) Jat 44-1000 Jaa 43 )(0.4) 

a er Senne Se =064557 =065 
V(8)(3)10.4) J96 3.098 
seaning the present test four times or averaging four repeated administrations of 
Thus, lene enon the validity coefficient from 0.50 to 0.65. 
tinecr = 


| : ed in 
he same oa it becomes necessary to know how many times the test should be —— a 
sometimes * artain specified level of validity coefficient. In such a situation, Equa fi 
wee s as shown in Equation 6.4 so that it may be known beforehand how many 
nge 


order to scare d 
<} . e a 
may me test should be lengthene , 
times * _ feta (1— Ay) (6.4) 
eee ea ro 
lee —Teinxihiy) 


(nx) 


r of times the test should be lengthened or repeated on the same sample: r. 
ag : 


. ts = iteri he test; 
where 1 = numb idity coefficient; r, = validity coefficient between the criterion and th 


: i val 
= | level oT Wal ee 
desired sabaili efficient of the test. - | 
| sink AF a ent of 
and fir oe a a test has a reliability coefficient of 0.40 and validity coeffici 
To illustrate, 


ny times should the test be lengthened in order to reach a validity coefficient of 
One ee a eapvent this, Equation 6.4 may be solved as follows: 
eel | (0.65)'(1 — 0.40) (0.42 )(0.60) 
”* 0.50) —-(065)10.40) (C125) —-(0.42)10.40) 


ee 
025-0168 0082 


. 4: pe 0.65, the present test should 
Thus, in order to reach the desired level of validity coefficient of 0.65, the pres 
hus, i 7 

hened three times. | | _ — tact Has beer 
aig teem 6.3 gives an estimate of the validity coefficient only ae aa ett as ie 
| Peced ntimes. But sometimes the investigator increases the lengt ean afthewalidiey 
cee criterion. In such a case, Equation 6.3 would not be a correct es 
ength of the c | 


a iss d: 
coefficient, and for such a situation another formula is suggeste 
r 


cx 


(6.5) 





ez ra a - re] 
on m 


betes n ) times; 
| imes and criterion lengthened n 
WNEFE Fein.) = Correlation between the test ocala i. oe m Sunes of times 
z teal 
hy= reliability of the test; mn = number of times the test has been leng ' 


fein, | 7 





lit: = relia ilj — 
bility Coefficient of the criterion; - 
est has | 
, and reliabilit a lengthened three times and 
COCFfficja es the COrrelatic, C te oetticient of the test 
-Cord INE to “uation SS vou the test with the criterion was 
“ WOUId be: 


y= 


2 . 
S, the the criterion f as a 
and the criterion dre () 50 bes, 
i 38 and p &n 
“65, aT 


Its validity 
‘cl. ) = a 0.65 


= I ee =092 
{ “~+0.58 (“S* +056) 
Thus, | h We a | 
"+ TONSthening + sae 
validity coefficient 4 the test and the criterion three times the original length INCTeasec th 
reliability and the valiath 0.65 to 0.92. Now one question arises: What would happen +, 
indefinitely, 


a . ' " C . nl to th 
of the t fect ic tely? Itt : 
ite reliability € test if the test js lengthened indefinitely? If the test js len 


Btheneg 
perfect but validity would not be perfect because the Variance 
nctiveness or specificity. The maximum validity whe 


of the test m 


would be 
is lengthene 


ay still have some disti 
d indefinitely ¢ 


a, ee nthe lect 
an be estimated by the following formula: 
Fs 
Noee = = 6.6) 
win 
where Pes 


xc = Maximum validit 
correlation (or validity coe 


y when the test has been lengthened indefinitely; r 
iticient) between the test and the criterion; 
the test, 


7 Tee = Obtained 
r,, =reliability Coefficient oj 


A close scrutiny of Equation 6.6 reveals that it estimates the maximum validity of 


the test 
(when the test has been lengthened indefinitely) through the index of reliability, which js the 
denominator of the formula. 


Range of Ability (or Sample Heterogeneity) 


Like reliability, validity is also influenced by the range of ability of the samples used. lf the 
subjects have a very limited range of ability (that is, the wider range of scores is not possible), the 
validity coefficient will be low. On the other hand, if the subjects have a wider range of ability so 
that a wider range of scores is obtained, the validity coefficient of the test would be enhanced, 


Ambiguous Directions 


If the directions of the test are ambiguous, it would be differently interpreted by different 
examinees. Moreover, such items tend to encourage guessing on the part of the examinees. Asa 
consequence, the validity of the tests would be lowered. 
Socio-cultural Differences 


Cultural differences among different societies are likely to affect the validity of a test. A particular 
test developed in one culture may not be valid for another culture because of the differences in 
ae economic status, sex ratios, social norms, etc. However, when a test is cross-cultured, this 
S( gz as 
factor does not affect the validility of the test. 

Addition of Inappropriate Items 


When inappropriate items, particularly vague ones whose difficulty values differ widely from the 


original items, are added to the test, they are likely to lower both the reliability and the validity ot 
the test. 





oF CROSS-VALIDATION 


Cc _n¢ are of the view that validation of a Comp! 
ON icians ers | Pleted test shou 
neti te eat, en mg to 
oF aay i CrOss-Va ark | l ' 
0 ne ‘This F rocess is Ca sae a ation. Thus the term CrOss-Validation “4 
° alyS™. of revalidating @ Hew sample of examiness 71 © process of nae Y Fefers to 
ane actice essary because the validity of the test when comput oO cross-validation 
mon S ae spuriously high because items a 
lew! 


ed from the standardirar, 
re Vo andard 
rt from this, the colar : etek a5 fo Maximize differences None 
sample's aw scores: Apart f ‘eV alaity Coefficient of the test will be een 
# he oe eculiar to the standardization sample, 
hls e factor 


In fact, a high vali . Oereased bby 
chan” der such See zon mee the test. has little val Sha anos 
result cn computed on He ar ii dt was used in item analysis will Capitalize on 
“oe ee pling errors wit in ss particu ar sample and will be spuriously high, 
gndo™ ° ic demonstration of the use for cross-validation USINg empirical data has been 

on an early investigation conducted with the Rorschach test (Kurtz 1948). The Rorshach 
f . ministered to 80 anes Managers with a basic Purpose of determining whether the test 
rest WO A ¢ any help in selecting such managers for life insurance agencies, In fact, these 
suid be me been selected from several hundred employed by eight life insurance companies 
: il an upper criterion Broup consisting of 42 managers considered very satisfactory 
ee anies and a lower criterion group of 38, considered unsatisfactory by their 
py thelr ap the eighty test records were studied by the Rorschach experts who selected a total 
compan siete respon se characteristics, OCCUITINE more frequently in one criterion group than in 
oi 34 aac found more often in the upper criterion 


BFOup Were given a score of +1 if present 
absent Similarly, those more common in the lower §roup were given —]if present and 0 if 
if a 4 bh 
and 0! 


an the scoring key based on these 32 signs was reapplied to the original group of 80 

absent. wher g0 managers were Classified rightly as being in an upper or lower group. Thus the 
ersons, / 3 “ Ben Nes test score and the criterion was nearly perfect, that is, 1.00. However, 

correlation De® administered on a new sample of 41 managers (21 in the upper and 20 in the 

when the tes! i ie purpose of cross-validation, the validity coefficient dropped to .02. Thus, it 

lower mona shai ihe scoring key developed in the first sample had no validity for selecting 
came ape 

we personriel. 


Validity 12] 


he effect of chance factors upon validity that may operate due to the use of a single sample 
y = .ction and test validation can be shown through this example: suppose that the items 
for item came which intend to select retail salesmen, have been so selected as to yield 
he Ee Ai snaliees in terms of the top 27% and bottom 27% of a sample ot sales 
coisas AS? : <arcie irrelevant factors may be presented in this group and some of which may 
personnel. In tact, some irre st. Such factors will often be correlated with responses to the 
ja correlated with scores on the test. Such factors : ~« caltasead acing the-same 
be sie 2 extreme group more than in the other group. If the test is vali ated Ba SNe: al 
cam te ‘ ik sly that those incidental factors will make the validity coefficient panee 4 
sample, it ee eae een lidated on a new group of retail salesmen and unless thos 
high. But if the test is cross-Vall ae ha carne degree (which is not likely), the 
incidental factors are present in this new group fo {ne : ‘oups than in the original 
validity coefficient of the completed test will be lower in the new § ne necessarily high in 
Ys nae -< way validity coefficient tends always to be un feats 
Se mation Broun. An us we) weet «validation necessary. Cureton (1950) had earlt 
the standardization group, thus = sera a selection and test validation can produce 
also established that the use of a single sample for space oer 7 
acompletely spurious validity coefficient under ‘iis nich the amount of shrinkage of a validity 
Now the question: What are those factors ot ie by validity shrinkage 's meant . 
coefficient in cross-validation depends? As pe Fiterion jess accurately with : ae oe 
phenomenon wherein a test predicts the eleva en to the question just cited Is tat so 
than with the original standardization sample. The an: 











122 Tests, Measurements and Research Methods in Bebavioural Sciences 


shrinkage depends Partly upon the size of the original item pool as well as upon ; © Pron. 
items retained finally ; ize of the sample. When the initial jtepy —Ptioy, 
y and partly upon the size o EM poo. Nay 
and the proportion of items retained is small, there is greater opportunity to capitali, 
differences and therefore, shrinkage of test validity in cross-validity will be the pre 
when size of the samples is small, there will be the greater validity sj 
©Toss-validation, [rt has also been observed that when items are assembled wit 
formulated rationale, the shrinkage in validity coefficient in cross-validation 
Mitchell and Klimoski (1986) have neatly demonstrated the differences In Validit 
really occur when items are chosen by rational or by empirical strategies, 


EXTRAVALIDITY CONCERNS 


Actual uses of Psychological test have revealed that even if a test is tal, valid and 
decision to use it is Governed by several considerations. These Considerations or 


technically called extravalidity concerns. Cole and Moss (1989) have pointed out 
Our major extravalidity concerns: 


li] What is the 


'S lar 
Eon Chant 
dlest, Li eu 
. } 
nibh . Uri 

ry 
will es Vious 
be Brea. 


UNDiased, th 
~onee 5 arp 
the ‘Ollowing 
Purpose for which the test is used? 

(ii) To what extent are those purposes accomplished by the actions done? 

(iii) What are the likely side effects or unintended consequences of using a testz 
(iv) What Possible 


Of these four m 


alternatives to the test may serve the'same purpose? 
ajor types of extravalidity concerns, the side effects and uninte 
“Onsequences of testin 


. Ationa| 

§ are important and therefore, we shall concentrate UPON it. Those reader. 

Who want to refer to other types should consult Cole and Moss (1988), Cronbach (1989) and 
Jensen (1 980). 


When a Psychological test is used, of course it produces the intended effect. But a 
some unintentional side effects are produced. Such effects cost a lot 
determine whether the benefits of giving the test outweigh the costs of potential side effects. The 
Consideration of these side effects should influence the examiner's decision to use a Particular 
test for a particular Purpose, The examiner may be forced not to use a lest for a worthy Purpose if 
he feels that the likely Costs from the unintended side effects outweigh the expected benefits, 
The unintended side effects of testing may be illustrated through some examples. Cole and 
Moss (1989) have cited the example of using psychological tests for determining eligibility for 
special education. Although the intended purpose of using psychological tests here is to help 


students In learning and strengthening their mental status, such process May produce severa| 
types of unintended negative side effects: 


(i) Teachers may not tolerate these chi 
(ii) The identified children may feel unusual. 
(iii) Other Children may call such identified children 
(iv) Teachers may have to arrange a special class segregated by race or social cl 
Likewise, suppose that MAMPI (Minnesota Multiphasic Personality | 
screen candidates for the job of police officer. As we know, MMBP| 
pathological traits or dimension of personality. Those candidates who beco 
to high score on some dimensions of MMPI, such as Psychopathic De 
Depression, Hypochondriasis, etc., may be tagged with 4 pathological | 
them from being appointed in other fields too, The repercussions are 
candidates but they also raise the probability of lawsuit hai 


Part from 
this, and the examiner has to 


Idren and view them as unworthy of attention. 


Names, 


Ass, 


Nvenlory) is used to 

basically measures 
me unsuccessful due 
Viate, Schizophrenia, 
abel which may debar 


ill 


Validity 123 


4 measures what it intends to measure, they 
valid 1 Mes that states that a test is valid if it serves th 

est a valley cick 1995). Such functionalist perspective r 

2finitiony 1988; ga only to enforce constructive conseque 

(cron obligatio’ (Messick 1980). ff ionalist p \ | ined that this new 
_guice nd other supporters o unctionalist perspective have opined that this ne 

rse sic 1999! of validity rests on the following four bases: 

1 


have preferred a functionalist 
© purpose for which it is used 
ecognizes that the test validator 
neces but also to guard against 


jtrat 
wide mn cient evi 
( 6 nt Tie value implications of the test interpretation 
disc analysis oF as usefulness of test interpretation 
“ dence ee the potential and actual social consequences including, negative side 
of the test . . | | 
to functionalist perspective, a valid test is one that answers well to all these four 


dence for construct validity, that is, for appropriate convergent validity and 


(iii) Evi 
.) Asse 
(iv) 
fects of the 
a according 


OF VALIDITY TO RELIABILITY | | 
. validity are two dimensions of the same thing, that is, test efficiency. Validity is the 
‘iry and in wi with some outside independent criteria and reliability of the test is the 
; the 


f the test. A test which is not correlating with itself is not expected to correlate 
self-correlation © pendent criteria. In other words, a test which has poor reliability is not expected 
: side ince 


with outs! h validity. Thus, validity is dependent upon reliability. This prediction is true for the 

to yield ne test only. If a test is heterogeneous (such as biographical-data blank), val idity may 
us tes! y. 

homogeneo 


ithout high reliability (particularly the internal consistency reliability). This is 

be high even srogeneous test, each part measures an independent function. As such the 
because int = A reliability) of the test, particularly the internal consistency reliability, 
saleeerrerB te” val low because items would not highly correlate with each other. But when 
would be Se AL eiiear sets of independent items, they are expected to yield a high correlation. 
correlated ve : x be High without the underlying high reliability. Reliability is a sufficient but not 
Thus, acini Aa fir validity. The validity of the test may be higher than the reliability but not 
a Pocane tn index of reliability because the index of reliability sets the maximum limit ot 
ae) tion that the test can yield with its true measures—the true measures being the criterion. 
vheavctically) the maximum validity coefficient between two variables is equal to ooo 
of the product of their reliabilities, that is, 75... =./h yh where nr, and a2 are reliabilities 2 t © 
two tests/variables. For example, if the reliability of test is 0.8 and the reliability of the criterion is 

1.0, then the maximum validity would be J08 x1 =/08 =O89. 


A test constructor should not always aim at having high reliability and high validity in the 
same test. If he does so, he is said to be working at cross purposes because sometimes the goals of 
reliability and validity are incompatible. This is because the requirements for high reliability and 
high validity are opposite to each other. High reliability requires items of equal difficulty and high 


intercorrelations between the items, whereas high validity requires items of different difficuln 
values and low 


intercorrelations among items (Guilford 1956). Obviously, attempting hig! 
s against the diticteces unfair to the validity aS well as high reliability would imply working at cross purposes. Fortunately, Nes 
Ns, Many p '1946) has provided a solution to this puzzle. He has provided a compromising and satisfact 
ning test va method. According to him, if inter-item correlations range trom 0.10 to 0.60, one can expect | 
have both the reliability and the validity to a satisfactory degree. 


Due to renewed sensitivity to extravalidity concer 
to widen their definition of test validity. Instead of deg; 


>yChometricians have preferred 
lidity in te 


Chnical terms, that is, 





e 
2 . Cliah 
Cons Canr | N-re] Idity of lew 
TUG vas Validi,. °° St Scores. Dien). C6 
C vaya. ns lelity, *. DISCUSS the .. vn) 
= Idipy Strucr Of scores “ Me sipn: Ny 
, __8Cuss ¢ Y. Validity> 7, COP. Bl flogs, 
i tite *Ctors thar ISCUSS The Steps involved ; Se oy 
! i 1 estat), 
2 : “ih ; ai : 
(a Notes on ence the Valid; ; lish 
(b nt Valic llow; >. | LY Of test scores, ae 
Wi 
c) “rent val ati 
Sctio On an a ae 
latio ee 4lenuatio Piscrimi Nant validation 
no Wa lire 
Efing the on alidity to reliability 
“Pl Of ¢ ae. 
“TOSS-val]i des: 
alidation, | Jiscuss its relevance for test co 
” AStruct 
On, 
“fe 


: 
NORMS AND TEST SCALES 


ing of Nonn-Referencing and Criterion-Referencing 
Mea a peveloping Norms 
° <a of Norms and Test Scales 
: age-equivalent Norms 
Grade-equivalent paontlis 
percentile Norms Cor Percentile-rank Norms) 
standard Score Norms 


computer Applications in Psychological Testing and Assessment 
. criteria af Good Test Norms 
a 
MEANING OF NORM-REFERENCING AND CRITERION-REFERENCING 
‘i individual’s performance in any psychological and educational test is recorded in terms of the 
pa scores. Raw scores are expressed in terms of different units, such as the number of trials taken 
within a specified period to reach a criterion; the number of correct responses given by the 
examinees; the number of wrong responses given; the total time taken in assembling the objects; 
and the like. All these raw scores convey no meaning in themselves. For example, when A has a 
ecore of 40 on the arithmetical reasoning test and a score of 30 on the history test, would it mean 
that A is superior or inferior or equivalent on the arithmetical reasoning test to the history test? In 
the absence of some interpretative data, it is difficult to answer. Usually, there are two reference 
points that are applied in interpreting the test scores (that Is, in interpreting what the scores tell us 
about the examinees with respect to the characteristics being measured). 

The first way is to Compare an examinee’s test score with the score of 
examinees on that test. This process is known as 
referencing is called as norm-referenced test. 
each person with a norm. 


a specific group of 
norm-referencing and a test based on norm- 
In simple words, a norm-referenced test compares 
The purpose of a norm-referenced test is to classify the persons from 
low to high across a continuum of ability or achievement. Such test, therefore, uses a 
representative sample of individuals—the norm group—as its interpretative framework. The 
researchers might want to classify individuals in this way for the purpose of selection to a 
specialized course or placement in any remedial or gifted programme. When raw scores are 
compared to the norms, a scientific meaning emerges. Norms may be defined as the average 
performance on a particular test made by a standardization sample. By a standardization sample 
is meant a sample which is the true representative of the population and takes the test for the 
express purpose of providing data for comparison and subsequent interpretation of the test 
scores. For adequate representation, the sample must include a cross-sectional representation of 
different parts of the population. In order to compare the raw scores with the performance of the 
Standardization sample, they are converted into what is called ‘derived score’. There are two 
reasons for such conversions. First, the derived scores provide occasion for direct comparison of 
the Person's own performances on different tests, because they are expressed in the same units for 
different tests (whereas raw scores cannot be expressed in the same unit for different tests). 


ees fre Resear, h 


Meth | 
lcd: in eheartoverca ACen CE. 


5 denote the 


Person's relative position ; 
ance may bes tive Position in the St: 


€Valuated jn comparis Ndard; 
; | Parison to Other | Wari 
CF Persone ,. (ON 
Ns fr- 


preting at , 
' est score is to establ; 
nee’s test, sco re establish 


‘ ; df external 

Mi ‘ epathe i St 

. with It. This process is known as ae ~ OF crip. 

Number Of items - ixed performance criterion, [f an @xa 'ON-refe, "Ctigg 
S (the Criterion) or answers them correct] oe 

, rl I§ 


Capable of the top. 
Otal Performan 
ance demanded by the test. Thus the Criteri¢ Said the SOM. 


IN Which th 
i on NOn-ref h 
oF referents (Glaser 1963) = Performance is linked or related to some hed les} x ; 
with which i "Ne question which arises here is: From where q etre diye 
: © Crite, BS 


lest is refer. 
: erenced? Ac 
referencing a lest is the traj Pa ne ari ee yi tale terion; 
NCS is Gira ve! WHICH results in an increase in the skill or proficienc: te19N fo, 
nN, and the §roup is measured by a test before taint ee 58a 
considered as seit c 0 scores are better than the Pretraining scores th : 
£0 (he results or outcomes of the training. In other words a 


results can be j 
+ Oe Interpreted aS an ind; : 
” N Indication of j ski otic 
an e " Tincreased skill or, 
nN example of the critar; OHitelBteneed skill or proficiency. Such a test COMStitute. 


are as follows: 
; ot, renced lest 
» Phe test js . : 
Penns Usually based upon a set of behavioral referents, which it intends to meac 
: © fest represents the samples of actual behaviour or performance, = 
© performance on 3 criterion-referenced test can be explained jn terms af 
of 


Predetermined cut-off scores. 


sionsl we be noted that a test cannot be automatically called a criterion-referenced tes) 
me y “Cause it has no norms, Thus a criterion-referenced test describes the specific types of 
oon Sets Knowledge that the examinee or test-taker can demonstrate. Such tests are dsedto 
comp: © examineée’s accomplishment to a predefined performance standard. Here th 
examinee’s result is compared with the predetermined standard, and no Comparison js avi 
with other examinees. In fact, it is entirely possible for all examinees to exceed the predetermined 
standard. Thus, the central focus in criterion-reterenced tests is on what the examinee can do 
rather than on comparisons to the performance levels of others, In this way, the 
criterion-referenced tests identify an examinee’s relative mastery (or nonmastery) of specific 
competencies. Today, these kinds of tests are very popular in education systems. 

A criterion-referenced test differs from a norm-referenced test. Ama jor difference between 
the two is the manner in which the lest Content is chosen. As we know, in a norm-referenced test, 
the items are selected in such a manner that they provide maximal discrimination among 


are used to identify ideal items according to the difficulty level, correlation with total scores and 
several other properties. 

The specific differences between a norm-referenced test and a criterion-referenced test are 

as under: 

(i) The purpose of the criterion-referenced test is to compare the examinees’ performance to 
some predetermined criterion or standard. On the other hand, the purpose of the 
norm-referenced test is to compare the examinees’ performance to one another. 

(ii) Items of the criterion-referenced test reflect narrow domain of skills with real-world 
relevance whereas items of the norm-referenced test reflect broad domain of skills with 
indirect relevance, 

(ii) In criterion-referenced fests, scores are usually expressed aS a percentage with a 
pre-determined Passing level whereas jn norm-referenced test scores are usually 
expressed as standard score, percentile or grade equivalent. 








Ful ‘ 
TOWLE rod Tecp St cafe 


1z7 
® criterion-rofe 
feature mat dct cs iam referenced tec distinet frern 
Pee fe =] thie F : t Fi Peay f 
n criterion-referenced test, most itern< are of similar iffic ulty can Wee 
oo Mevel whereas in 


acyl : 
vy AR that | 
or 
EVELOPING NORMS 


d test items vary widely jn difficulty le-vesl. 


<reP9 i norms teps in developing norms. The following oe pwned it 
oveloP' he proper 5 ep | NB af€ the three important steps in 

i oll Ne norms- 

wi i far 

ng the samp Z 
dizing the conditions 


be discussed as follows: 


e target population 
le from the target population 
select 
. standar 
: ps may ; 
fining the ta rget population: | 
i genet in developing norms Is to define the COMPposition of the target group. The test is 
ey re used for a particu’ar ype of person OF group of persons. The COMpdosition of the 
rend - ‘also called normative group) is determined by the intended use of the test. Let us 
moet Boul he test constructor has constructed Test of English as Foreign Gahimisde (GEE 
a se that t st is intended for those students whose native tongue is not aclytinvtiewe, 
sup jously: pena where the medium of instruction is English. Thus, for TOEFL the target 
r tudyin’ Till consist of these students who are relevant and appropriate. For TOEFL, 
PhD candidates will be an example of inappropriate one. 


These ste 


ul 
i ylation o : 
P lecting the sample from the target population: 
2, Selé he test constructor has defined the target population, he proceeds to select a 
When t sample from each of the target population or group. To make the selected sample 
) 7 of the target population, a cross-sectional representation of the target population 
representa e, A cross-sectional representation is one in which people from all sections of the 
must De ie ion are represented. For ensuring representative character of the sample, various 
target pop “f sampling are employed. Generally, for constructing norms a larger sample is 
ange constituting a larger sample, a completely random sampling technique is the best 
ei + due to its impracticability, this technique is seldom employed. Generally, cluster 
one; DU Sth . 
sampling or its variation 1s preferred. 
= eis ae sor | atic est: 
3. Standardizing conditions tor proper implementation of te consi 
- inistrations are standardized, valid and proper comparison 
Unless conditions of test administrations are standa oe he aderauaie sound 
; F Fi i) factors lke 
mes, 5 ss | s are impossible. Therefore, tac 
individual test scores to test norms a 8 sc ectiee Wie Geeitieed 
control, lighting, ventilation, temperature of working ae = eal Secon 
riled. Above all, factors like test timing, test security, adherence to oat te 
d assuring that the examinees work on proper tes ae rms 
and assuring rs t these standardization procedures, norm: 
standardizing the conditions of working space. Without thes 


represental 


cannot serve as a usetul comparative device. . ee 
These are the important steps in developing norms of a test. 


comparative device, these steps must be covered thoroughly. 


TYPES OF NORMS AND TEST SCALES xs and depending upon each 
Ordinarily, the derived scores may be divided into four common sinew derived scores are age 
of these four scores, there are four types of norms. The tour + 1963). Accordingly, there are 
‘Cores, grade scores, percentile scores and standard scores (Lyman 1765, 





128 Tests | 
obs, Meg Th he . 
ASU PeMments ana Research Methoes I” Behanioural Sctences 


four type | 
5 of norms—age : 
. : 5 norms, grad rm: ntile norms and standard. 
detailed critical dj &fade norms, perce ard-score i 


isCussion of each of these js given below. 5 A 
Age-equivalent Norms 
| lee a a ; . 
: &€-equivalent norms are defined as the average performance of a representative 
el age level on the measure of a certain trait or ability. If, for example, Wwe 
3 eignt of a representative sample of 10-year-old girls of the state of Bihar, ang 
average of the obtained wej 
girls, 


Sar lp of 
Measure th 
ms for the weights of 1! Ut the 
ght, we can determine the age norms for the weights of 10-year." 
“Ole 


_ Age norms are most suited to those traits or abilities which increase systematical| wi 
ince most of the physical traits like weight, height, etc., and cognitive abilities like 
intelligence show such systematic change during childhood and adolescence, age norm 
more appropriately used for these traits or abilities at the elementary level. 


There are 


Neral 
5 CaN be 


some disadvantages of age norms. 

'. Age norms lack a standard and uniform unit throughout the period of growth of Physica 
and psychological traits. As pointed out earlier, age norms are suited to traits or abilities Which 
show a progressive growth with advancement of age. Consider the example of genera) 
intelligence. The growth in the level of general intelligence from age 8 to 9 js in NO Way equa), 
the growth from 3 to 4 and 14 to 15. This is true because the growth at the earlier level is faster 
than the growth at a later level and it almost comes to a halt after 16 to 17. Not only this, thi. 
problem becomes further aggravated due to the fact that even at a particular age level the rate 
growth of general intelligence (or traits like height and weight) is not uniform for all children and 
sometimes it varies to a great extent. In such a situation, we are forced to view with skepticism the 
meaning of equality of age units used in age norms. 

é. Another problem in age norms arises from the fact that the growth rate of some traits are 
not Comparable. For example, progress in maze learning does not ordinarily take place after 
adolescence but progress in vocabulary continues even after adolescence. In sucha situation, the 
age norms for these two traits cannot be compared. 

3. Atrait like acuity of vision cannot be expressed in terms of age norms because this trait 
does not exhibit progressive change over the years. Many other personality traits would also fall 
into this category. 


Grade-equivalent Norms 
Like age-equivalent norms, grade-equivalent norms are defined as the average performance of a 
representative sample of a certain grade or class. The test, whose norms are being prepared, is 
given to the representative sample selected from each of the several grades or classes. After that, 
the average performance of each grade on the given test is determined and then prade 
equivalents for the in-between scores are determined arithmetically by interpolation. This 
average performance is known as the grade-equivalent norms. Thus, if the average number of 
items done correctly on the arithmetic test is 30 by the sixth grade that con 
sample then a raw score of 30 becomes a grade norm of six. 
norms indicate the grade levels at which the p 


stitutes a representative 
In this way, the grade-equivalent 
erformance of representative groups is average. 
Grade-equivalent norms also have some limitations. 

1. Grade-equivalent norms of the same student in di 
For example, the grade-equivalent in social studies of 
grade-equivalent in arithmetic of the same stu 
school may contribute to the kn 
concepts is primarily dependent 


ferent subjects are not comparable. 
a student cannot be compared with the 
dent because everyday life experiences outside the 


owledge of social studies whereas knowledge of arithmetical 
upon formal training in the class. 


Ww 


NOTMLS cand fest Scales 129 


ivalent norms assume that all students of a class 


: Of grade have more le« 
xperiences. This assumption may be true in the at or less 


ade-ed" 
ora ementary classes but it may 


‘culum 4 
for highe ival is not suited to th : : 
ade-equivalent norm 15 not suited to t ose subjects in which 
lementary class and a very slow growth in the hig 
here occurs rapid growth in the elementa 


cur 
ital 

pal 3 The BY 

e 

wih and arithmene * 
sp’). sher classe? 
he ite these 

ment tests an 


achiev Norms (or Per centile-rank Norm) 
; centile aning of percentile norms is explained, it is essential to look at the related terms: 
e me 


there occurs rapid 
her classes. For example, in 
ry grades but growth is slow in 
ge ing ce ae 

limitations grade-equivalent norms are common, 


i Particularly among 
d educational tests. Such norms are also suited to intelli 


Bence tests. 


and quartiles. As we know, a scale of percentile rank consists of 100 units and 
k of ascore represents the percentage ot Cases in a distribution that had scores at 
pe aq the ane cited. When a score is identified by its percentile rank, the scale is called as 
oF wer than : pose, for example, that Mohan has a score (X) of 53 on an achievement test and 
or rcentile. ee (to which Mohan belongs) had exactly a score of 53 or lower. Thus, Mohan's 
a P 70% of - ; a percentile rank of 70 or PR7g and his score would be called as 70th percentile. 
t aa 53 rank refers to a percentage and that percentile refers to a score. A centile is 
= ;, ercentile ercentile and is the percentage rank that divides the distribution into 100 equal 
equivalent sera centile is specified as C50 and is equivalent to 30th percentile. A decile is the 
Thus: = k that divides the distribution into 10 equal parts. The 7th decile (specified as D,) is 
Be me 70th centile (percentile). The 1st decile is the 10th centile, etc. A quartile is a 
apes , , hich divides a distribution into 4 equal parts. There are 3 quartiles in any scale 
‘| ma and they include 1st quartile (Q,) which is equivalent to 25th centile, the 2nd 
of percent! Taihach is equivalent to the 50th centile or 5th centile and the 3rd quartile (Q,) which 
quartile t to the 75th centile. 
is equivalen . opul dc on type of norms used in psychological 
| ile norms are the most popular an common typ psy : 
Percen’ | tests. Such norms can be prepared tor either adults or children and for any type 
and SE anal norm indicates, for each raw score, percentage of standardization sample 
of tests: 4 a below that raw score, To illustrate, suppose Mohan has a score of 26 on the 
oe z aeons test and if 40% of the standardization sample secures below the score of 
Lhe _ has percentile rank of 40 or PR,» and percentile scores of 26. Percentile norms, 
i oa i basis for interpreting an individual's score on a test in terms of his own standing in 
thus: Oia sandardization sample. If the percentile norm is to be meaningful, it should be based 
— ; sample which has been made homogeneous with respect to age, grade, sex, oo ag 
sy tied factors, otherwise separate tables for percentile norms for each age, grade, sex an 
occupation should be prepared. . | 
camille percentile norms are also reported through the pra Braph pee hates 
cumulative percentage of scores falling below the upper limit of each class interv a = “i 
enables a person to read his score (plotted on the x axis) on the basis ot pire av phere 
cumulative frequency percentage (plotted on the Y axis); or the percentile ime can i nace 
basis of the person’s score. Some of the important advantages of percentile oF el soa Geen 
are easy to construct, easy to understand, and even an untrained Pears ¥ , 
Despite these advantages, such norms have two distinct limitations. . ies arte 
1. Laymen as well as skilled persons sometimes fail to distinguish ark se 
and the percentage score, and the obvious result is confusion. 7 the wes Seti — 
confused because percentile is a derived or converted score usua ly pines eters el 
percentages of persons whereas the percentage score is a raw score expres | 
Percentages of items carefully answered or solved. 


e 

e * 
efor iles 
, tiles ai n 


ercenta 
uivale 


£50 Tests \ 
ial | Tecistiresy, r 
ne ea 
MS ari Reseanch Methods in Bo 

~~" Saviour] ¢ " 
2. : ’ : TH AcHeNOOS 

The Diggest limitatior 
Ns is that of ine 
é' that most. of 
*Pressed in te 


MEN annrovins: 
Ss of Pproximate the 
uch as Percentile ¢. Percentile Norms, the si 


they are 


quality of units thro 
middle is 


the raw SCOres 
normal distribution and 
ngle raw score difference at. 
| ay lead to a Chang 
5€ Faw score difference at the Suremes 
other words me aly 


' FAW score differenc May produce a very small percentile ra 
exaggeration in Percentile rank S€S at or near the middle of the percentil 
Percentile scale Produce a ven S whereas the 


ie raw score differences at eithe 
ewan 2 ry smal| change in percentile ranks, Thus percentile s 
8 other thir, Taw scores. The above point has been illustrated in Figu 
of 50 and = oe at the centre or mean of the distribution, 5% of 
So Cases aloe 
at the extre 
number of percent es between PR of 90 and 95 make a broad low bar. Th 
“Fank points covers a wi COres 
ea . e of scores at the extreme 
range of scores at the centre or m sr er 
faw-score difference (shown 
€Xapeeration in Percentile rank 
raw-score differ 
in the y ; iff 
ré nt percentile ct. This means that the raw score differences at 
ighly shrunk in percentile transformation. Thus, 
systematically unequal. 


e 
L Le ‘ 
nk differen. 


e 5Cale , 


reveals amon : 
between PR 


re 7. i 


ar. The sam 


by tall narrow bar) 
s ( 


rodu 
ences (shown by 


that is, difference of 5 percentile ranks) whereas only 


the extremes 





Pi 
= : Le 
—3oG M +l0 #20 +30 
PRO.41 2 16 50 55 8490 95 98 99.9 
Fig. 7.1 


Selected percentile ranks in norma! distribution 


3. Percentile norms indicate only the person’s relative position in the st 


andardization 
sample. They convey nothing 


regarding the amount of the actual difference between the scores, 
Standard Score Norms 


A norm which is based upon a standard score 


is Known as a standard score norm. The 
reason why, one needs standard score norms in place of percentile norms is that here units of the 


scale are equal so that they convey the same meaning throughout the whole range of the scale. In 
this way, standard score norms remove one of 


the serious problems of inequality of units 
common among the percentile norms. 


In order to understand the standard score norms, it is essential to first know the meaning of 
the standard score itself. Standard score, like the percentile score, is a derived score. It has a 
specified or fixed mean and fixed sta ndard deviation. There are several types of standard scores, 
such as 2 score (also known as sigma score), T score, stanine score, deviation IQ, etc. Of these, 
the 2 score is very Popular and has been frequently used in preparing norms. In fact, some writers 
use the term “standard score” interchangeabl 


ra y with the z score but in reality, this should not be so 
because the z score is just one of several types of standard scores. 


Bhoy th 


Veh, 
abe 
le 

ce, | 
r extre “Ce 
FEMes oj the 
© the 
Which 
of the cases have been shown by a tall narrow b ie 


BE 
us, the same 
: rang da Narrow py 
edian of the distribution. In other words, a smaller (OF Narrowe 

at the median is definitely p cing a 

vy the bigge; 

the broad low bar) at the extreme are able to produce change 
€ ranks to that effe 


are 
percentile units of the percentile NOrMs are 


i’ ie 


Norms and Test Scales i131 


ndard scores needed? Standare scores are needed primarily for two reasons. 
hy are sla erformance of the same person on different tests is to be compared, it is best 
wD sen the ae rting the raw scores into standard scores. Secondly, standard scores have 
sprous eS asurement and their size does not vary from distribution to distribution. Hence, 
done units al were used in interpreting test scores, 
equ frequen 7 n be converted into standard scores through two methods—linear 
hey scores normalized transformation. When raw scores are linearly transformed into 
ation me characteristics of the original distribution of the raw scores are retained 
pans ecores, 4 in the distribution. As such, all statistical computations which can be carried 
stare” any change tribution of raw scores can be duplicated with the standard scores. The z 
witho* iginal dis of a linearly transformed standard score. A z score indicates how many 
gut’ an example f a score lie below or above the mean of distribution. A z score is a standard 
«core 4 deviations co of zero and standard deviation of 1, It is computed by subtracting the 
sar “stich nas dividual’ raw score and dividing the same by the standard deviation (SD) of the 
score om ais ts of an equation, a 2 score may be read as under: 


za (7.1) 
oO 


re or sigma score; X = raw score; M = mean of the distribution; o = standard 
= 7 5cO ne 
of the distribution. 


irstl 





where Zz 
deviation te, suppose Mohan has obtained a score of 60 on a test and the mean and SD are 
To illustrate, 


gand 15 respectively. Then, 
30 an 


60-30 30 
ae | 
15 15 
ans that Mohan’s score is 2 sigma or standard deviations above the mean of the 
Nae nn if another person has obtained a score of 20 on the same test then 
istripution. , 
7 _ 20-30 -10 


=—_ =—066 
15 15 


This means that his score is 0.66 standard deviation below the mean of the distribution. 


Likewise, if another person obtains a score of 30, it means his score is exactly equal to a z score 
IKE t 
of zero. 


7 score Is often used in making comparisons. In fact, the transformation of raw scores into 
z scores is very useful when the researcher wants to Compare scores from two oe 
distributions. For example, suppose Bobby earned a score of 60 on the test of psychology. 
Further, suppose that mean was 50 and standard deviation (a) was 1 0. An the a viv 
Bobby’s score was 56 and mean was 48 and o = 4. Now the question is: In which class does 
Bobby have a better standing? At a glance, Bobby's psychology score is higher than his avere in 
geography. Also, a look at deviation score shows that Bobby is 10 points above the mean ih 
psychology and only 8 points above the mean in geography. Can we concie that 8 e 
performed better in psychology than in geography? Reality is that we can't compare eo 
psychology score to geography score because these scores come from different momen ae 
this, a score of 60 in psychology means something different from a score of 56 in geography. ny 
direct Comparison between these two test scores would be like the proverbial comparison of 
apples to oranges. In fact, for determining the class in which Bobby did better, it is essential wisi 
the distribution of both classes must be standardized to make them similar so that meaningfu 
©“omparisons can be made. Therefore, both scores are converted into z score as follows : 


132 


—_ Mears, 
emer, Gnd Resacr.-p, Methods im Bebartcural Science: 
= — 60-50 10 
™ Ty ies + = +10 (for psychology) 
za A-M  56- a a 
“/_ “Zo 2eography 
Now it is Clear thar ~ | 4 P 2 (for geography 
Tw Standard Bobby, Sz 
+] 


| ss © = | r a : ivy 1 
Nation above the Ore lor geography was +2, which means tha 
‘00 oF one sta 


red | class mean. On the other hand, her z score f Eis 
NE much better ae deviation above the mean. In terms of relative class standin ORY is 


E. ; 
In psychology because z scores describe relay 
There aFe two 


: i 
Wt 7 
e Positive. 


ther tes, Sting. 
GF Psycho! 


POSition in Siete Of 2 score. First, 2 score represents the most precise way of indicay 
ranks o 1OULON, S£cCond, although it does not sound good when computing ay 1" 
' Percentile ranks, it is = — ais Wali igadl.c erage of 
SUDDOSe a Person has a pertectly legitimate when z scores & Used. For Example 
comprehension w. 2 score of 1.25 on the mathematics test and a Z score of -0.5 on a Verbs| 

€st, the number of Standard deviations away from both means, on the avera 
Th ~ 1/2 = 0.75/2 = 0375 standard deviation above the mean of X's, see, 
obtained ino, difficulties associated with the use of z scores. First, a z < 
cum . p 08 — Minus torm as it is obvious from the above example. Zz score 
further com ad andle. Second, Z scores may be expressed in decimal Points which tend to 
Scaceeede “omplicate handling them. In order to circumvent the first difficulty the score js made 
rger ¥ adding to each 2 score a constant, such as 50 or 100, so that the minus signs are Omitted 
bes difficulty Caused by decimal points is solved by using a larger standard deviation, that is bi 

multiplying every 2 score by 10 or 20. Such linear transformation of a z score into : 


$c i 4 NEW standard 
7COre IS VEry Common, Particularly the system of having a mean of 50 and SD of 10. The formula 
for such transformation of the z score is shown below. 


Standard score = z (new standard deviation) + new mean 
To illustrate the use of this 


core May be 


(7.2) 
equation, let us consider the examples given in Table 7.1 jn which 
the raw score has been converted into a z score and the z score has been converted into a 
standard score with a mean of 50 and SD of 10. 
Table 7.1 Transformation of z score into new standard score 
Students Raw Score 2 Score Standard Score 
Mean =50; S5D=10 
A fa 1.2 62 
B 40 -2 30 
Mean = 60 


SD=10 


In Table 7.1, A's raw score of 72 has heen 
the z score has been further converted into 


converted into a z score through Equation 7.1 and 
through Equation 7.2. Thus, 


a standard score with a mean of 50 and SD of 10 
7-72-60 _12 


io} ~6wW 
Standard score = 12(10)+ 50 = 12+50=62 
+, 40-60 —20 
a ee we 
10 10 


Standard sCore = —2(10)4 50 =—2(¢) + 50 = 30 





134 
rant uses to which the abcre linearhy transformed standard score can be peut 
ithe my pf the sarne person's pertormance on different teas Table 7 2 serves as an 
one “ *pariso" his table, the raw scores (shorn in P art |) nave been converted Into stancard 
a" 7 ‘ , 
in : = ai ths: In 450 and SD of 10 (shown in Part IN) through Equation 7.2. 
bs i ayn a 
at mes | ; 
m” Ys ais le 7-2 Linear transformation of raw scores into Standard woores* 
re Tab e - 


Part |: Raw Scores 






olor Soci 
120 


at 
100 70 35 
B 110 BO 45 
S 130 60 30 
S 100 60 35 
oe 20 10 5 
= 


res of only four students have been shown. 
«Raw 5CO 


Part ll: Standard Score 
Mean = 50;5D=10 









A 60 80 90 = 
B 50 60 90 ne 
ce 55 70 70 va 
D 65 50 60 om 


| of Table 7.2 shows raw scores of only tour students out of eight students, and Par " 
we : | | transformed standard scores. At the bottom of Part | are given the mean and SD of 

hoy Lan : e ou look at Part |, it conveys litthe meaning. What is the overall periormance of 
a = ee je tests? On which test did he pertorm the best or worst? Which of the tour 
ogee pe the best overall performance? None of these questions can be answered. But when 
papain are transformed into a standard score, having mean = 50 and SD = 10, these 
questions can be easily answered. Student A had a standard score of 60 in psychology, gene a! 
standard deviation (mean = 50 and SD = 10) above the mean; in sociology he is 3 stan a 
deviations above the mean and in education he is 4 standard deviations above gis aie 
Accordingly, it can be said that A’s performance is the best in education; excellent “a — oer 
and slightly above average in psychology. His overall performance over these three tests is | 
which is 2.6 standard deviations above the mean (that is 50). In a similar manner, the 
performance of each student can be compared. Of the four students the best overall performance 
is that of A because his performance is 2.6 standard deviations above the mean while the 
performance of B, C and D is 1.6, 1.5 and 0.8 standard deviations respectively above the mean. 
Normalized Standard Score 


So far we have discussed the linear transtormation of raw scores to standard scores through 
the zscore. Such standard scores have all the properties of the original distribution of raw scores, 
that is, examine scores are related to raw scores through a linear function. One general demerit of 
linearly transformer 


1 standard scores is that they can be compared only when they have been 
derived from distributions that have approximately the same form. \f one distribution is skewed 


Pariso En the (Wo standard 
i “USC Ssary 3 di Noft © Standard sco 
INTE Stand ; i rect Sol ion ie 

ard Scores, whi 'StO Convert 
Standared 5 Ores wa C dre pj 


SCOFES Cannot he 


aw scores (which have a sk 
malized and then, compare the two 5 


alized cy normalized Standard scores. Marshall & Hal 
_ _©©d standar Scores 
'Stributio aS § 


Converted to 


' , Us normaliz | a standard base with a Preassipn 
elms Of the Norma] distriby standard scores are the standard scores, 
derjy ution, Normalized standard scores can be ex 
andard sco Scores, that is, with a mean of zero and st 
re Can therefore, be interpreted in a similar wa 
S that the xaminee has surpassed 84% of his grou 
Of his group. There are three comm 

‘ Slanine sc 


wesc derived Standard scores, normalized standard sco 
isin ee ‘core was first devised by McCall (1 922) and na 
, Portant leader IN educational measurement. T score is defined a 
aseq “PON a mean of 59 and standard deviation of 10 as well as U 
distribution curve T-score scale is 


» The based upon the normalized 2. 
benerally extends from -3 to +3 standard deviations with decimal fraction 

score scale has g fange of 20 to 80 in most distributions (F 
T scores and their distribution autom 


e. In terms of a formula, 


between. The T- 


scores are transformed into atically 

4PProximates a normal cury T score is equal to: 
T=z710+4 50 

where z refers to the 2 score and is comp 


uted by Equation 7.1, 








30 20 -lo 0 +10. We sh 
z Score es ee ee ee ——_| 1 
af =Z = 0 +7 +2 +3 
| T Score i nn a ee Dae 
| 20 30 40 50 60 FO BO 
Stanine 4%, 7% 12% 17%, 20%, 17% 12%) 75% , 4% 
1 2 3 4 5 6 ? B 9 
Deviation IQ. —L—__ ate ee eee 
(SD=75) 55 70 85 100 115 130 145 
| Deviation 19 . a ——— 
| (SD=1§) 52 68 84 100416 132 44g 


Percentile a ee Re | Se a 


011 25 49 20 30-40-29 -@ 


different TYPES Of standard 


og ee 


60 70 80 90 95 99.9 
Fig. 7.2 IM'PArsOn of 


SCOPES in a norma! distribution 





a 
res derived from differen i 


IStrip, 
Ores 

120) 
\ 


ta Ndard 6 


is es (1975 
Cores which have been “adjusted (0 prog 
€d Mean an 
Which dfa 6 


Bi Fr, 
latig, 
¥. A Normal of 


P and -] indicat 
on Normalized 
ores and the deviation IQ. 


res Can be 
med after E Th 
5 d standard sco 
pon the Shape of th 
SCOFe scale 
5 at Various 
igure 7,2) wh 
takes g shape 





Mig 
i Such 
have 


lea 


Slandary 
€5 th a 
Standa, 


d 


Presse in 
Omdike, 3 

Which jg 
e Norma 


Which 


POints in 
én raw 


Which 


7.3) 


Norms and Test Scales 135 


i . j j i cl SCOre., 
s a raw score (not grouped into frequency dlstabution) into a oka 
3 conver e been grouped into a frequency distribution, a direct table’ for 
Aire av 
ores 
oe! scsiee is available. 


tandard score is the stanine score which was developed ny the oe 
ormalized s Id War Il. The term ‘stanine’ is a contraction of Standard nine ncn 
con nothe" ss during plane inging from 1 to 9. The mean of these scores is 5 and the standar 
; ressed In mane ay 2 (Figure 7.2). When raw scores are transformed into are 
‘2 1.90 OF eT cae a shape approximating the normal curve. As a “oo s inte 
, automatica x nsed scores on the C scale. In the C scale, there are 11 sco p = 
e the conde mean lying exactly at 5. For computational facilities with oor al 
oto 10 with a o points at both the extremes (that is, O on the lower end sent n 
angine | card pect Te thus leaving only a nine-point scale (called the stanine scale). 
he and) are c 
pure’ er end. 


nti { | here are 
1 scale is the sten scale proposed by Canfield (1951) where there 
the hl of the ae aeaed 5 units below the mean. 


| in order of size and 
tg — 9 UNITS a be transformed into the stanine scale by omic? nse eho aoeeaee 
- can be é ir ore points according to the norr ili 
cores C4 » of each stanine sc . e covers 
centage 0 d stanine covers 7%, third stanin 
first stanin “%o, sixth 17%, seventh 12%, eighth 7%, 
» 7.2). The %. fifth stanine covers 20%, sixth a, 
rs 17%, Titth s by 300 students 
urth stanine ae ses. When, for exarnple, there are 300 scores earned ye Fans 
Jeune 4% of the to ne stores (4% of 300) would receive a stanine score of 1; th 
nt es 
the low 
st, then 


cal ‘ore of 2, and so on. 
on a fe id receive a stanine sc 


j zed L 1Z | st a 


a tandard 
iati out 16 (Figure 7.2). The size of s 
f 100 and a standard deviation of about 16 ( : hp aaa! Uanily sender 
re has a mean O fependent upon the intelligence test eing sscrnuieh HCN AAEL 
a iatiOr is, NOWEVEE, he ae but a uniform standard deviation of 15 or 16 ism A sain 
iat ion 2 to | ; jations in perc | 
eatlon vanes tem : d deviation units there will also be variations in glare an et 100 
ee variations in standar cordingly, variations in score points (Figure 7.2). With a me ai 
ite distribution and seni Be 32 would be 2 standard deviations above the — ie is 
non s ; jation , : hen an SDo I 
¥6..1he GeNmates below the mean. But whe 
crea aap aan ld be —1 standard deviation ; 1 and the 
ae @ B4 wou catti ve the mean an 
ot "a viata iQ of 132 would be 2.66 standard deviation above 
- a Ls =} | ‘ = ri Mn. 
taken, Sook 684 would be —1.33 standard deviation below the mea 5 Ae al oe 
deviation iation 1Q score and the traditional IQ score (called the ge peor: | 
| ite eal age By Tie Cron arog tal: age) Has nee much ot ti fi IQ ts not really 
meas : sed regarding the meaning of these two terms because the eviatio ich the pants 
nee eZ a normalized standard score, nor is it calculated in a manner in wi Py bewanees 
Sur itisa : od at 
ee spain As a matter of fact, many experts have objected to the term a 0 is 
nian — + ccgtbiagtenn € lling such standard scores as the 
eee . The only justification in calling s | 
isle antl eas like 1Q"s provided the chosen SD on the test is equal to the previously 
VHRR Cat 2 inieipnetce hel ai “Ore in terms of the deviation 1Q was the Wechler 
known IQs. The first test to express its scores i adays, most of the intelligence 
Intelligence Scale in which the mean is 100 and the SD is 1 3. Nowa ays 7 sises ace neaiis 
= deviation IC e rather than the ratio IQ score in order to circumven p 
tests use the deviation IQ scor dents are referred to Chapter 9. 
posed by the latter. For a discussion of these problems, student 
Local and Subgroup Norms 
-roup 
sometimes local an 


Local norms are d 
EXamMinees 


if | the test. 

d subgroup norms are constructed to suit the specific purpose at ener 
etined as those norms, which are derived from i pay a 0 = 

: ‘i ived ft speaking 
rather than from a national sample. A test having norms derived from Hindi-sp : 


= oe . 7 = ny a j 
The interested reader may consult Guilford & Fruchter (1973, 469), 


136 Tests 
ae Measuremengs mind Resear fy Methods 


opl ih: 
local norms, Suber Cathet than Hind-speaking people of whole indi 
subgroup which pci 4 norms are those norms that consist of scores obtaj 
sisban is haps chao 3€ formed with respect to sex, ethnic background, Beogrank: Mili tf 
NnMent, socio-economic level and many other similar factors Phica| S id 
HMese two norms is more beneficial? The answer to this aa 
Purpose of testin 


ing. For ex dardized inte|| ve on aebe 
: ne ~ SP example, local norms for standardized intelligence + ds y 
superior in pred ‘i ° Bree test May be conn” the 


a acting the children’s competence within their nonschoo| environme “NSideren 
school ina ney NOt be much effective in predicting the children’s competence in Weve 
©' Instructional Programme of the whole country. MAiNstra, 


will | 
7 an pe 
ned from an (tle 


COMPUTER APPLICATIONS IN PSYCHOLOGICAL TESTING AND ASSES 
The application and use of c 

been a recent development 
Ways: (a) In creating new 
tap (b) In administering, s 


omputers in psychological testing and psychologica| assessr 
. For testing and assessment, Computers can be used jn laa has 
tasks and pickups measuring abilities that traditional Procedi: NaSic 
coring and interpreting traditional tests. ES Can't 
An early inspiration for using computers to generate tasks that one can’t Present thre, 
traditional methods, in fact, came irom the application of psychophysical and Signal-dete ce" 
Procedures. In these procedures, a stimulus or signal is presented and the subject js anid 
report his perception. These psychophysical and signal-detection procedures are being appl to 
to ability testing Gensen1982; Nettlebeck 1982). Backward masking tasks (in whic! the oe 
stimulus is followed by noise stimulus known as mask) and reaction time are commonly sed c 
measure the speed, Capacity or efficiency of information processing (Saccuzzo, Larson ¢ 
Rimland 1986). The prominent assumption is that variations among individuals who differ jn 
psychometric intelligence tend to reflect different information-processing Capabilities (Hunt 
1980; Jensen 1986), Test developers have adopted the reaction time and backward masking tasks 
for presentation by personal computers (Saccuzzo & Larson 1987; Saccuzzo, Larson & Rimland 
1986). Other investigators, with the help of standard software, can easily verify results from 
psychophysical and signal-detection procedures. 


still another application of computers is in the computerization of tests. Currently, there are 
numerous programmes for administering, scoring and even interpreting a host of tests (Baker, 
1989; Gutkin & Wise 1991), At the simplest level, those tests which are meant for group 
administration are now adapted for computer scoring. Apart from this, there is also increasing 
availability of computer disks which can be employed by test users to score tests on their own 
computers. The ASSIST programme developed by American Guidance Science is one such 
example. At the more complex stage, narrative computer interpretation of the test scores are 
available for certain types of tests, In such an approach, verbal statements with particular pattern 
of test responses are prepared by computer propramme associates. This approach has been 


widely used in personality and aptitude tests. In MMPI, the test users may obtain computer 


printouts of the diagnostic and interpretative statement about their personality together with the 
numerical scores. 


Individualized interpretation of test scores is also done by what is called interactive 
computer systems in which the individual is in direct contact with the computer by means of a 
response pattern and thus, engaged in a dialogue with the computer (Katz 1974). The reality is 
that in one ie lest scores are usually incorporated in the computer towerher with other 
infonwenan Provided by the person. Subsequently, the computer combines ei the available 
information about the individual with stored data particularly sibseneic his siGalincal ee 


Zes all relevant facts and relations in answering the individual's 


interactive comparer soe My reaching certain decisions. A very good example of such an 
y IS SIGI (System for Interactive Guidance Information) 


my Beha Ou hal aT “fie "He ee | 





Nerms ane Test % ale; 47 


chers have reported that the presentation of the te 
pr toes not appreciably reduce reliability or validity 
se ant However, certain applications ot computers 
esl by : 1986): yf test scores (Butcher 1985; Matarazz5 1986) 
go seek 
gyi sf retal ion ee 
agi te onside ae 
Ar ds). © »stiNg- 
azal =e 4sed le 
par” tere uter- 
ome for comP : ith the view that pertainir | 
~ ideline? ‘the experts agree w! seins a! pertaining to Computerized testing, score 
guien da mast oT narrative interpretive scoring are of major concern which deserve special 
ie rability = et al. (1991) have pointed that when the same test is administered in 
comp cit? and in traditional printed mode, the comparability of the scores need to be 
jon. > an ex 
atten’ eriZe fe) i The reliability and the validity of the test may also vary. Thus it can be 
cee invest ens the two modes are shown to yield fully equated test forms, the same set of 
carel uded that unle ropriate as well as applicable for both, Narrative interpretive scoring has 
cone ; may not be pee n because sometimes it has been observed that adequate information 
vie sed some - users for evaluating the reliability, validity and other technical properties of 
fae rovided to a Likewise, at other times it is also not clear how the interpretative 
is f jve sys ft neces re Rica 
0 interpre” sets ed from the scores. In the absence of such guidelines, narrative interpretive 
e . : 4 1 
sateen w so longer remain a meaningful affair. , 
a ;a) contribution of computers to psychological assessment is impressive get \ 
The potentia land 1992). Farrell (1992) has recently identified several major applications 
re : ' * | 
& Wise 1991; ze chological assessment, specially to cognitive-behavioural assessment. Ve 
of computers TO ar collecting self-report data, (ii) directly recording behaviours, (iti) eo 
3 are \ . nag esizi ioural assessment cata 
| data, (iv) training, (v) organizing and synthesizing oe 
Vie as 7 : = ‘ 
observation behavioural assessment data, and (vii) supporting decision making . 
ar va | 1 ‘ . new 
eran technologies have also led to the development and rapid growth esi te clinical 
ec . 4 i: » 5 inclw ; 
it the assessment of cognitive functions which are extensively being ite islet 8 
instruments for tp s< well as in the study of attention disorders and learning disabilities : se 
neuropsychology 4s t allow us not only to vary task presentation conditions more seen y ; 
Couch 1992) one = ce on different task components but also allow recording and ee ali 
eeumie samet -h as timing in the way not possible by the traditional test. The 8 \ 2 
response parameters suc ies tioning is one of the best examples ol recently develope 
itiv neti 8 ' : ig | 
Assessment of Cognitive ch is designed to screen for likely signs of cognitive 
computer-administered battery which is desig 3 


, : : 
impairment in adults (Powell & Whitla 1994). idespread application of computers In 
He r some factors continue to impede the widesprea mie s is one of the major 
ee ; he lack of acceptance by some practitioners Is 
2 cia Sabian oil | le (Farrell 1992). Most of the vendors are 
factors. Second, evaluation of software is another obstacle (rarre! ‘ly, potential users don’t have 
relucta nt to make their products available for review. ci ior Fhiad, mioeesysternsol 
proper and sufficient information to evaluate the qual ity . and statistical procedures. This 
computer-assisted test interpretation tend to combine citnica aa the systems in the same Way 
specilic mix of quantitative data and clinical judgment a ak the judgment. Apart from 
as the technical quality of the database and the clinica yu . often unavailable because of 
this, the needed information to evaluate a particular syste 
Proprietary concern. Ecomputera it both psychologica 
Despite these obstacles, it is hoped that greater use oF COMP 


i itnessed in future. 
esting and psychological assessment will be witnessed in 


st content and scoring of the 
of test scores (Lee. Morena & 
may lead to misuses and 


. For guarding against such 
le attention has been BIven to developing Important guidelines for 

AERA, APA and MCME together have developed several standard 
based testing called The Testing Standards. 


essed 
juler d 





133) fests Measurements and Research Methods in Rehavtoural Sctences 


CRITERIA OF GOOD TEST NORMS 


We have seen that test norms are of different types. Regardless of the TYPE of tec Nis 

essential that their adequacy should be established. Test developers have set Sore crite: Mis 

good qualities that norms should possess. The following are main Criteria for judging Wiest 4 of 

nc the test norms possess good qualities: "SF 
1. Test norms should be representative: One of the important characteristic 


ts that they should be a true representative of the group with which COMparison is d 
test norms should be based on a random sample taken from the population the 
extremely difficult as well as expensive. Therefore, the matter has to be sett! 
all the subgroups of the population such as boys and girls, rural and urban a 
status, Caste groups, schools of va 
sample, 


5 Of tes 

€Sired Ide te) 
* 1ea| 

ed. At the minim is 


reas, SOCiOwge, 
rying size and so on, must be adequately represent omic 


ed In the 
2. Test norms should be relevant: Another good quality of norms is that it should 
relevant. Since norms are based on various types of groups, they must be meanin ful in jj tof 
the concerned group. Some test norms are based upon national sample whereas sOMeé are limi 3 
to samples taken from a limited geographical region or state. The variety of groups available fy, 
comparison necessitates the study of the norm sample before using any table of MOMs. If the 
researcher wants to compare sample or student with a general reference group for diagnosing the 
strengths and weaknesses in different areas, national norms may be a good decision. But When 
the researcher is trying to decide such things as which students should be placed in a highly 
select group, which ones should be encouraged to pursue engineering course, which Ones 
should be encouraged to pursue medical course, national norms are less fruitful and jor such 
decision, norms for each specific groups are needed. A student might have an above-avera e 
aptitude and achievement when he is compared with students in general, but he may fall shor of 
the ability needed to succeed in any accelerated group. 


3. Jést norms should be up to date: Good test norms should be up to date and they should 
not be based upon past uses. When achievement and education are g 


OiNg Up, a given raw sCOre 
will produce a higher percentile rank when compared to old norms than when compared to new 


norms. Therefore, old norms should be discarded and new norms should be preferred. 


4. Test norms should be comparable: Another good quality of test norms is that they should 
be comparable. When the researcher wants to directly compare scores fr 


om different tests such as 
when he compares aptitude and achievement test scores to identify underachievers or when he 


makes profile comparisons of test scores to identify the strength and weaknesses of a student, 
comparability of test norms is needed because only comparable test norms can help the researcher 
in doing this desired function. A comparability of norms is assured when the norms of all tests have 
been developed on the same population. Whenever the researcher wants to compare scores from 
different tests directly, he should check the manual to ascertain whether the norms are based on the 
same group or whether they have been made comparable by any other means. 

5. Test norms should be properly and adequately described: Test norm 
information about the norm g 


roup, the norming proc 
information is needed and only then, one is able to ju 


any particular purpose. Test norms should provide at | 
(i) What is the method of sampling? 

(ii) What are the number and distribution of ¢ 

(iii) What are the characteristics of the hoemm group especially reg 

_ _ Sconomic status, geographical location, education level, ¢ 

(iv) dey standard conditions of administration and motiv; 


s should provide 
edures and other relevant factors. Such 


dge the appropriateness of test norms for 
east the following information: 


ases in the norm sample? 


arding age, sex, socio- 
aste, social group, ete.? 
iWion Were maintained during 


i, i 


ke 4 reentile norms’? Discuss the advantages a 


Norms and T “Some 145 





you understand by norms of a tes? Why are norms heeded for a psychological 
doy 
what 


. age noms and grade norms. 


mG disadvamages of 
a | 
) orms. — ; 
rcentile standard about standard score? Discuss the research utilities of differen: 
what is eaeee d-score norms. 
4. yes of ‘id major computer applications in psychological testing and assessment 
” piscuss the major criteria for a good norm. 
iscuss 
6. D 


> 


. 


RESPONSE spt IN TEST SCORES 


CHAPTER PREVIEW 


CauhousmesS 
Extemencss 
« implicanon: of Response ss : 
» Methods to Eliminate Response 2€tS 
Design of Test Items 
Adapting Test Difficulty 
Use of Comection Keys 
Modifying the Directions 
Use of Good and Appropriate Scoring Formulas 
Use of Easy Multipie-choice Items 


i 


MEANING OF RESPONSE SETS 


There are several factors which tend to invalidate test scores. Response bias is one of them. When 
4 response 10 any test item tends to be altered by the examinees in a Way that it indicates 
something other than that which the test is intended to measure, it is Commonly called the 
response bias (Guilford 1954). A response set is a kind of response bias which occurs asa result of 
assuming certain mental sets by the examinees, For example, the exami = 


| nee may as i 
reasons best know Nae! nip y assume for 
ag nto him that only the “Yes alternative of ‘Yes-No’ is correct in a articular te 

nd accordingly he may answer all | Particular test, 


'1946, 476) has defined 

in a different form.” 7 ‘ responses to items than he would 
| orm.” To illustrate the definiti | 
and C—have rate the definition of Cronbach, 


: €ported identically on = 
ight respond different| aces 


items in terms of “Yec! 
FESPONSE Set as “the tendency caysin es’. Cronbach 
when the same content is 


6 4 person to give differe 
presented 


ternative item 
y when the same content is provided 
‘a multiple-choice 


May be said to he 
example one sub 


item). Assuming that 
due to some kind of 
£004), , Subject might be guided by 
€impulsion Pe by the social undesirability tendency 
NUS 2 response set tends to make the response 

140 





qittere™ 


1vPe 


rom) that which the lest 
ve ad in tests of ability, personality. ; 
ident -o sets. The following are the 
respor” 


¢ OF RESPONSE SETS 


pe’ jation sai 


A cquiescence Set 


yiescence is another response set found often in 
as inventory. It refers toa tendency to respond 
ve or ‘True’ or ‘Agree’ (Yes saying) or only ‘No’, (No saying) on each nem 
The acquiescence Is taken s be a ‘Continuous variable—at one end are examinees atic 
-onsistently give ‘Yes’ or ‘True’ answers and at the other end are the exam; 
sa ‘No’ or ‘False’ answers. Acquiescence sets hay 
ore of the test. If the test's “Yes' or ‘True’ responses are keved positively, Guessing due to this set 
may enhance the total score more than the actual Score. Likewise, if the test’s ‘No’ or False’ 
responses are keyed positively, guessing due to this set may enhance the total score more than the 
actual trait. In either case, a distorted picture of the examinees emerges. 


Faking-Good and Faking-Bad 


Faking-good, also known as the social desirability tendency is a very Common response set in 
personality tests, service selection tests, etc. As its name implies, social desirability tendency 
refers to a tendency on the part of the examinees 


to give answers in a way that tends to create a 
favourable impression of themselves. Thus the jaking-good response set helps the examinees in 


increasing their chances of being selected by painting a good picture of the self. Such a tendency 
is very Trequent in those tests which assess socially desirable traits like emotional stability, 
punctuality, trustworthiness, responsibility, etc. A faking-good tendency has also been popular 
among tests on the basis of which the examinees are to be selected for a job or to be admitted to a 
school or a college. It has also been observed that under certain circumstances the examinees are 
more motivated to fake bad, that is, to select answers that may give an unfavourable impression 
or socially undesirable impression of the self. Such a tendency is known as the faking-bad or the 
social undesirability tendency. A patient who is to undergo psyc hotherapy may give such answer 
On an adjustment inventory which may show that he is more maladjusted. Likewise, a military 


officer who wants to return home, may give responses on a personality inventory which make 
him appear more mentally ill than he actually is. 


: Case of the Personality inventory and the 
in some systematic way such as marking only 
False’ or ‘Disagree’ : 


© several important conseque 


What is indicated by the tendency to choose a socially desirable response in a personality 
inventory? The common-sense viewpoint suggests that the tendency of faking-good indicates a 
kind of deliberate deception on the part of the examinees, But empirical evidences are otherwise. 
According to Edwards (1957) who has done extensive researches on the social desirability 
variable, such tendency is nothing but a sort of facade effect or a tendency to “put oneself in a 
good front”. He further revealed that the examinees are mostly unaware of such a tendency. 

Crowne & Marlow (1964) and Frederiksen (1965) held the view that the social desirability 
tendency was related to the need for social approval, self-protection, social conformity and the 
need for avoiding social criticism. Likewise, the social undesirability tendency (or taking-bad) 
may indicate the need for sympathy, attention and being helped by 


others in solving problems. 
onedi hen the examinees are instructed to fake pood 
; have indicated that w ke ood on a 
Experimental evidences 


‘< instructed to give responses in a manner that could make better show off), they are 
test, (that is, 1s 


P= 





. “Search M 
dal ~ Mii . "BVIowral Sciences 
PTOVide * TDIs hy “Ordinghy, 
> Ne j, § y, IMpre, 

mans: : Sociait sting mpl’ Sher Ssores than when a ne 

ie antics Y des; rable , ation fo, the test. The effectivenc. | struc 
hen r | NS€s is drastically reduceq Of a est ' 

apr NSes to ™h 

ESTEE, stro lems j 

"gly a m2 


test ar 
USion . S28ree «j... © WOrded y; 
describe . MUSion Mong st” Slightly go With often, sometimes, 
as ‘Somer; ng the exam; 


seldo 
agree, strongly disagree, etc., thees: iene Slighy 
ateZory, th. Often’. Such. ause What one subject calls ‘Seldony, 4,° ‘Feate’ 
Evan: reby affe Ctinig te “onfusion introduces error in the interpreta. May 
Veness 8 the total score. Aion og the 
© ten c 
¥ to Mark +9 
respon ee: pg =e 
me ie “tetnatives are’ es Nan ‘indiferent, etc., is known as evasiveness. ” 
doubtfyi MONE the examinees one Agree-Disagree’ or ‘Indifferent’, there is found 3 VEF the 
Cronba Catego ee Oomeh © evade the two fixed categories (Yes- 
“(1946 ae, OW the ex 
doubtfyy , : 


| fae NO) in fig oes 
) has ra aminees feel safe in marking this doubtfy| Cat Ue 
be G80ries had a : Ported that a score which was wholly based UPON the ny “Bo 
Oefficient for | re lability Coefficient of 0.73. Guilford had also d 

.. “ale gories Of responses, Obviously, such 
researched, —' Y€4lS a consistent 


| eMonstrated 3 wer of 
tendency to mark tha 
habit on the part of 


0 

the examinees, Which MUSt be 
Peedily 

Work for speed rather than accuracy is another common response set. Such 3 
_ Y observed among tests of abilities or specific ability. Examinees havin 
oronented Motivation, go through the test of ability (in which ordinari ly there 
; IMe limit for giving responses to the items) rapidly without cari 
Engthy tests of lemperament ( 


ng much for accuracy 
. say with 300 or more than 300 items) also 
ExaMinees to respond speedi ly ig 
score to the ex . 


Encourage the 
noring the true content of the items. such a tenden CY distorts the 
rent that it fails to reveal the true picture of the Wratt being measured and thus, the 

validity of the test is lowered. 


Primarily eg 
reasonable t 


Sa 


Tendency to Guess when Uncertain 
The tendency to guess when uncertain about the correct response is also a common| 
response set. This occurs Mostly in the situation where items are very difficult to answer and/or 
where items are given in a multiple-choice format or in matching form. Ability tests are more 
subjected to this type of response set than personality tests because in the former items are often 
written in either multiple-choice form or in matching form. Obviously, such a te 
contributes to the error variance of the score and, therefore, lowers the validity of the test. 


Y Observed 


ndency 
Cautiousness 


Cautiousness refers to a tendency to lea 


ve the items untouched or unanswered (particularly in the 
ability test or the special ability test), specially when the correct answer is in doubt. Such a 
response set is just the opposite of the tendency to guess when uncertain, and a large number of 
such responses, may adversely affect the test efficiency, 
Extremeness 
Extremeness refers to a tende 
rather than points 


ncy to mark the points 
answers to the 


lying only at the extremes of the rating scale 
towards the middle of the scale. If this tendency works consistently in giving 
items, it may also lower the efficiency of the test. 





Response Set in Test Scores 143 
ve response sets particularly on social desirability, acquiescence and 
he abo hrough several stages. In the beginning, these response sets were 
assed : ror variance which must be eliminatec from the test scores. Later, 
ce rce ol i en to be indicators of broad and durable Personality characteristics 
were wet right (Wiggins 1962). At this stage, th 
their | 


ey were Commonly called as 
ye. NSE SETS 
se RESPONS 

vs sONS oF 


sets exhibited in answers to tests of ability, 
d the common types _ we groan - rtant lications f : 
: s have c : 
wert jscuss© 4. ec. TRESe TespOle = PP P 
interests 


or the test 
i - ans are briefly enumerated below. 
palit’ | implication 


duce the range of individual differences in test scores, 
_ sets may re 
7 gesporme | 
; alas 


tribute to the error or irrelevant variance of the test scores. In other 
com 


; tend to dilute the test with factors, which are not the part of the test 
oP nS, one ‘ aitone, reduce the validity of the test. 
wot an : 


- in atest become more obvious and marked when items become difficult, 
gesponse ee unstructured. Hence, such items should be avoided. | _ 
* ambiguous ¢ re consistent. Researches on response sets reveal that response s 

: eponse sets are © s are consistent from one administration to another administration 
a pee by re test to another with some minor exceptions for a few examinees 
a! atest are HOM change over these administrations. As a consequence of = 
who may show - these response sets are now considered as general vie a 
consistency: ee which are worth measuring in their own right. — on 
some trails = vga} and jackson & Messick (1962) provide clear ‘pte 
by en when considered as hes ae cena ee — nl 

sponse seem me ables diy ail sp Scie | 
eee this SE ie STL me to another, tend to raise the reliability - — 
different Te onc ansttieey also contributes to the true variance of test scores. Bu 
score because 


ibutior riance and 
i s very small as compared to its contribution to the error va 

ibution 1s Ver 

ane such set should be controlled. 

hence, ¥ 


THODS TO ELIMINATE RESPONSE SETS 
ME 


‘ “1 , t is measuring 
ferent response sets may under certain conditions, where pie eee 
While the ST an r traits of personality that are similar to the response set itself, 

: 55 or othe : 
carefulness © 


idi f the test. 
irical validity, their general impact is to reduce a deen een 
a called response sets the enemy of validity. As such, they must t 
Cronbach (19 d> C 


in controlling the response sets: 
t. In general, the following methods may be adopted in controlling 
from the test. In ¢ 
Design of Test Items 


logical and em 


; farms of test items, 
first preventive procedure in controlling response sets is to use ta gp a eae 
The tirst Pere si . 2 the operation of response sets. Forms a eal eutral- Disagree’ 
which may ciscourage such as “Yes-No’, ‘2’, ‘True-False-Indifferent’, Dee Nees iescence 
fixed-response eo eee se fea ea key encourage certain response sets like ie the use a 
et ee “ eae response sets the test technicians have ie abla ci scale 
EMGRIENE STEN: bis we een ane of judgement in the place of the apane ri ye the validity of 
S-point, — ein aie 5 or 7 points tends to increase the reliability an 
requiring aiscrimin 


; = of 
| ‘e fwo types co 
ia ced by the abov 
the test scores by reducing the response set variance produ y 
: \ > and evasiveness. 
response sets, namely, acquiescence and evasive 





ENOL Scion, Uy 


lerms or phra - = “ 
~ PATASES Which appear to them 
reference Schedule (EPPS) utilizes 
Ersonal | eMory utilizes forced tem: 
re to give {wo preferences in each item—ong mon 
dice items - Cescriptive of himself, But Subsequent researc : Han 
examinees Roms are Nol very effective in controlling the social desirabiy; ® sho 
A IEMs tend to introduce some administrative and tec a im 
ility in controlling social desirability "ESPONSE sets. A acute 
Desitahitin Shien as suggested by Mech] & Hathaway (1 946) isto USe an Other Way oj 
(Many ny y »Y) scale like the K scale of | ultiphasic Pe | 
ON this scale may be correlated with 
~— being “ntrolled) to give the index of soci 
SENIticant COeTic) 


Scores on the 
Correlation may be either 


al desirability, 
reformulated or drop 
! Strability tendency 
tO select only those items which are of a rel 


IEF and 
Ped. A thi 
a5 suggested by Wiener (1 948) a nl 
atively neutral type with respect 10 5 
r this purpose Edwards (1957) has Constructed a 


Minees tg 


; oltre 
ually acCeptab| at 
the forced-chy: 
choice | 


the Minnesota M 


hiniaes tically, 5, and the items havi 
ranging from 4 to 6 are considered as neutral with respect to s 
Social desirability s 


Cale values above 6 are taken to be indicati 


and below 4 lo be indicative of the laking-bad, Edwards’ 9-p 
Table 8.1. 


oint scale 


Table 8.1 Social desirability 


ratings scale 
Ratings 


Meaning of Ratings 
Extremely undesirable 
Strongly undesirable 
Moderately undesirable 
Mildly undesirable 


Neutral 





Mildly desirable 
Moderately desirable 
Strongly desirable 


oN DR we we OK 






Extremely desirable 
Adapting Test Difficulty 
A response set increa 
difficulty level of ite 
a manner that if ite 
items which are r 
psychophysics, 


s€5 with the rise in the difficulty level of items, As such, itis essential that the 
Ms be in accord with the ability of the examinees, Tests should be 
ns prove to be difficult for » certain ability level of the e 
Clatively easier should be Riven to them, Tests of this ty 


designed in 
xaminee, another set of 
pe are very common in 





figy Ma lif 144 
eys 
on K =e . 
| prect! he correction keys have alee, heen UW ize¢} foe conte 
es OFT ‘al personality INVENTOry, Aa 
mo: 
yg fi 


oll 
mely, it the mes 
: nts to be answered by any of th 
es ateme : 
i ite 


PSpOnGe bias, Fr 
ditheae i ane Where there dte ie 
: t validating scale wh; EOries 9 aes The! Falagt 

ive rr + there are four va NB SCales Which are Nothing but different ty 
i no 5 The four validity ae a = Question Score (1), the Lie s, Ore i |, the vate 
an tion ut ‘the watile gh Pini ), ‘ me h 

ee), ae king-good, faking-bad, misun Erstanding, EVAsiVENess, ote. 

cot erkessh fa ane usually used to evaluate the OVeral| beriormance of 
catele?? L and { these scores goes beyond a Certain SPEC Free 
score? ry. Han ) ted as invalid. The K score 
ance 8 Nene attempt to fake-good a 
per tes a cet 


isa kind of "sy 
to fake-bad. Since the K score is a ‘< 
mpt to 


@ Titst: three 
the xAMINGE On the 
value, the examines’ 
PPTESSOF score’ 

OFE indicates self-criticism o a 
ie i UPPTeSSOF score’, jt 's COMputed as 3 
C vate atte be added to scores to get a final total score, The Fevised edition of MMPI, 
ven score 2 three additional validity scales: Back F (F,) scale, Variable Response 
td MMPI-2, He scale and True Response Inconsistency (TRIN) “cal. The, scale provides 
alle <jstenCy (VRIN d correction throughout the test. A high score on TRIN scale ind 
at? on validity ate to most items (Yes-saying) and a very low score 
iondenY ie cami A high score on VRIN scale indicates that a large n 
with false (NO 


logically inconsistent manner. 


nd a low K se 


IC ates, 
Indicates to respand 
umber of items were 
wa 
cored | 
: ‘ irections | 7 7 
Modifying the - written or modified to control some of the response biases, For example, if in 
ve , 5, 

Directions ie i is clearly mentioned that of the total true-false 

‘rections It! 
the direc 


n hi would automatically control the t nden y to answer only 
e tins e i 


st of the examinees). Likewise, if directions are sie to the ee ra > ae 
acquiescence se’ there is likely to be negative marking, it would “oe * / vei 
incorrect answer W hen uncertain. Similarly, evasiveness may be contro! ed bi a 
tendency to guess W k ‘2’ or ‘Uncertain’ or ‘Indifferent’ as far as possible because that a : 
axaminees not to mar deebable tendency. Some investigators have demonstrated t : . 
indicative of some un ted to answer all items and as far as 1 sel me 
examinees should be ae accuracy may not be lost because of speed. One danger in : ee i 
heen (ee) Te aieWii all items is that it may increase random error but in many cases thi 
the examinees | iw 


the true options 


Use of Good and Appropriate Scoring Formulas 


items, incorrect items 
i ing of the correct items, inco | 
d iate formulas for the scoring 0 : codes 

veloping good and appropriate BEHRENS: 
a i snipe items, the response set variance may be jecuiced to - sa ae “0 aed 
i fod tena the bail solution is to obtain a factor analysis informatio d 
Guiltore 54), the bes A | 
different scores and accordingly, to weigh them. 


Use of Easy Multiple-choice Items 


5 isagree’ encourage guessing on the 
lwo-alternative items such as “Yes-No’, ‘True-False i hess See ahd 0%, Wh en 
Part of the examinees, Moreover, the chances ial lb ines: chanics FoF Buekaing vs 
Hulechoice items are used in place of the two-a pg variance is also reduced, Hence, the 
aulomatically reduced and therefore, the response veal cae memes 
multiple-choice items have been regarcad bkmany ting multiple-choice items is ro wr 

One point that should be taken care of in ashi | liccet multiple-choice tests for 
Must match with the ability of the examinees. If necessary, 





—— ~~ 


nd pesearch 





Yetbods 17 Behavioural Scwences 


146 Tests measuremen's @ s) 
; - oe Ma be constructec _ Another point. if; 

+ cet lke examinees it Wi | 
different levels 0! pa vot variance, is regarding the position of the correct ee Whi. 
fkely to produce FE*F 1d not be given a the same position in each item, rathe Wers og ti 
tem. Comect amet ndom! distributed. Mosier & Price (1945) have reco. it poy. 
y¥ location Id be £4 y ltiple-choice iterns, both ri OMIM en, ii 
: to control respon se set variance een wy _ = © $, both right and Were, Ok ¥ 
year -andomized with respect to their position in the given set of alternative na. ang! ) 
— method. = We. bs 


Z Rev; 
lew Quetig 


shat do you mean by response bias? Discuss the important types of resp. 
found in psychological tests. vue SQA , Frey 
2. Distinguish between faking-good and faking-bad. Point out the implicar; 

sets for psychological test. ‘ONS of eS 

2 Outline a programme for the methods of eliminating response sets. 


ot 


<> 


Part Two 
PRINCIPLES OF MEASUREMENT 


y 
MEASUREMENT OF INTELLIGENCE, 
AND ACHIEVEMENT 


| CHAPTER PREVIEW 


e Different Viewpoints Towards Intelligence 
Intelligence As Whatever an | ntelligence Test Measures 
Intelligence As General Intellectual Capacity 
Intelligence As a Combination of Groups of Traits or Factors 
Intelligence As Structure of Intellect 
Intelligence As Learning Ability 
Intelligence As a Two-leve] Process 
Intelligence As a Global Capacity 


¢ Types of Intelligence Tests 
Stanford-Binet Scale 
Wechsler Scales 
Downward Extension of the WAIS-III: the WISC-III and the WPPSI-R 
Concluding Remarks on Wechsler Scales 


A Comparison between Binet Scales and Wechsler Scales 
The Kaufman Scales 


Nonverbal] Intelligence Scales 
Some Indian Intelligence Tests 


APTITUDE 


* Types of the Intell gence Test Scores 


Mental Age 


Intelligence Quotient 
® 


Distinction between Apttude Test and Achievement Test 
Types of Aptitude Tests and Achievement Tests 
Aptitude Tests 
Achievement Tests 
Essay-type Tests Compared to Achievement Tests 
Limitations of Achievement Tests 


* Tests of Creativity | 


DIFFERENT VIEWPOINTS TOWARDS INTELLIGENCE | = 
uckman (J 975) in his ‘Foreword’ has said, “The modern history of testing is the saan alt 
ot Intelligence Or mental ability.” The term mental test was first used byJM ew “ 
“n American Psychologist and studied with Wilhelm Wundt in ig mae instrument’ 
eat Britain. In fact, Cattell was a champion of what is now known as the iene see 
“PPtoac lo Psychological testing because he demonstrated heavy reliance upon 


149 





esholds and reaction times. This approach was basa, 


‘on that keen sensory abilities were essential to intelligence. ion ti 2 Studeny r 
the assumption d that sensory test results (that is, reaction time, naming of colours) bore it 
Cattell paca llege grades. With such revelations, psychologists abandoned the use, 
“ 7 am a epee weaned as indices of intelligence. Today psychologists f, 
reaction tit 


| sure in th 

us attempts to define and mea ‘pons | nis 5, the 
semmiin of the - is not free from controversy, and there is not a a definition Which i 
readily capable to all psychologists and educationists. In order to understand the Meaning» 


this term, it is essential that the viewpoints of different experts be carefully examined, The 
experts whose viewpoints are presented below, have defined intelligence (or its Companig 
term ‘general mental ability’) from the different uses to which intelligence tests have been Dutta 


‘nstruments for measuring sensory thr UDon 


0 
| : AVE mag 
telligence. Despite their numerous attempts 


Intelligence As Whatever an Intelligence Test Measures 

Boring (1923) defined intelligence as whatever the intelligence tests measure. In other words, 
Boring’s definition suggests that when we administer an intel igence test, we implicitly accept a 
intelligence whatever that test measures. This so-called definition was a favourite with those 
psychometricians who wanted to keep themselves away trom theoretical disputes over 
intelligence. But this definition creates several difficulties. First, there are many tests or 
measuring intelligence. Which of these tests can be regarded as a standard measure of 
intelligence? The answer is not given by Boring’s definiton. Second, many intelligence tests have 
a cultural bias and if Boring’s definition is accepted, people would have to define intelligence 
differently in different cultures and there would be no common meaning of intelligence. Thus, i 
will create more confusion among the experts as well as people in general. 


Intelligence As General Intellectual Capacity 


Some psychologists have defined intelligence as a general intellectual capacity, which is made 
up of several discrete abilities. Binet (1919) has defined intelligence from this angle. According te 
him, intelligence is a general intellectual capacity which consists of the abilities: (i) to reason well 
with abstract materials (ii) to comprehend well (iii) to have a clear direction of thought (iv) 0 
relate thinking with the attainment of a desirable end and (v) to be self-critical. Synthesizing the 
same, it can be said that he has defined intelligence as inventiveness that is dependent up" 
comprehension and marked by purposefulness and self-corrective judgement (Stanley : 
Hopkins 1978). Binet conceptualised intelligence as a single but a complex mental proc 
which can be measured by various kinds of materials designed to measure the integrated me" 
imap Binet did not really develop a theory of mental ability, rather he reinforced that sa 
rh larson Just as physical abilities did. He also felt that the tel 
mental lcs fa Fe ous Complex mental acts. That is the reason why he included gone 
Puzzles in his test, which he developed along with Simon. The second edition 0! the 

NWT. ; of 
ed tasks that were grouped into age levels, depending - 7 
m. Binet died at a relatively young age in 1911, uy ir 
was published. The same year William Stern, 4 Ge ding 
intelligence quotient for representing a child's git 


mates. ; ; : ; are ¢ 
°s. Spearman's (1927) viewpoints regarding intelligent ce 


Psychologist, introduced the term 
relative to his or her age- 
Placed under this cat 


‘2 6eneral intellectual 
and a specific factor 


dependent UPON and are an e 


I 
Capacity. According to Spearman, | jac 
i? yall” 
ok Hees underlying intelligence. All mental activities om nets’ 
Xpression of t 


intelllg 
Capacity. He postulated the existence of a gore e 
| ale 
this g factor. He defined the g factor as 4 men 
‘ Intelligence tests h 
NOWN as tests of 
aptitude tests. 


dv " ‘ 4s 
€ been defined as tests to they? a 
“at 


Beneral COBNitiVve ability 4 7 pial the Beneral level of cognitive functions and hence’ or ace 
omenigence tests are also called IQ tests, scholastic aptitude tests 


Measurement of Intelligence Aptitude and Achievement 1§1 


equired in all mental tasks and that 


that Is F ge : vat ts possessed by all ine 
bec ause people differ in their mental activity). A positive corre! 
aT | . : 


, functions indicates the presence of the # factor. Another fact 
tw ) : eo 
actor underlying a particular type of mental activity. Such af 
a fla 
+ ici correlation between two or more than two 
4 


lividuals in varying degrees 
ation between two or more than 


or Called the s factor, is a spec ific 
. Actor is unique to the activity itself, 
, 3 functions indicates the presence of the s factor 
of the two. the g tactor is More Important because it is an Important measure 
Spearman s view is known as the two-factor theory of intelligence 
activities have some general component (called the 
intellectual capacity, and each £ factor has its own specific component (called the > factor), with 
the general one being the more important one. Spearman proposed that the aim of any 
intelligence tes! should be to measure the amount of g factor of each person because it provides 
the most important basis for predicting a person's behaviour in different situations. It is relatively 
of little use to measure the s factor because it is unique toa specific activity. Tests which measure 
abstract relations such as the Raven Progressive Matrices and the Cattell Culture Fair Intelligence 
Test, are also good measures of the g factor. | 


The most difficult problem faced by Spearman's two-factor theory is the existence of group 
factors. In 1906, Spearman and his colleagues had noted that dissimilar tests could yield 
correlations higher than the values predicted from their respective g loadings. This finding 
automatically raised the probability that a group of diverse measures might share something 
common as a unitary ability other than g. For example, different tests might share a common 
unitary memorization factor that is really a halfway between g factor and the various s factors. 


which are unique to each test. Thus, the idea of the existence of group factors was incompatible 
with Spearman's two-factor theory. 


nt intelligence. 
which states that all mental 
g factor) in common indic ating general 


Intelligence As a Combination of Groups of Traits or Factors 


Intelligence has also been understood on the basis of the combination of groups of traits or 
factors. Thurstone (1 938), LL Thurstone & T G Thurstone (1943) were the first persons to define 
intelligence on this basis and their theory is popularly known as the group-factor theory or 
multiple-factor theory of intelligence. They are of the opinion that intelligence is not an 
expression of the general factor, as postulated by Spearman, rather it is an expression of the 
combination of groups of traits or factors. Such factors are intermediate factors, not so universal 
and common as the g factor nor so truly specific as the s factor. According to the group-tactor 
theory, some common mental activities have a primary factor on the basis of which they are 
distinguished from other mental activities. This primary factor gives the common mental abilities 
‘Or factors) a functional cohesiveness and then, they are said to constitute a group. Another group 
°lcommon mental activities is said to have another primary factor and so on. In this way, there 
are a number of groups of mental abilities (or factors), each of which has its own primary tactors. 
On the basis of extensive factor-analytic research, Thurstone & Thurstone (1943) postulated the 
re seven group factors which they designated as the sine Mental Abiities -_ EMA: 
“ <—e Primary mental abilities do not incorporate the entire range ot human apilties: ey we 
analy from those found in abstract intelligence and in academic learning.) Factor analysis ts a 


aoatieal Procedure that identifies a proup Of tests, which correlate in such a way that they seem 
Biches. | | r 

hare a Common dimension. 
1, 


d, 
3 


The number factor (N) is the ability to do numerical calculations rapidly and accurately . 
The verbal factor (V) is found in tests involving verbal comprehension , 

The space factor (S) is involved in any task in which the subject: manipulates an 
imap iNary object in space. 


wet | Mil f isolated 
- The word fluency factor (W) is involved whenever the subject is asked fo think of isolate 
Words at a rapid rate. 


a 








| Methods in Behavioural Sciences 
| rements and Research | 
152 Tests, Measurem 


5. The reasoning factor (R) !s found In tasks that require the subject to distiue 
| principle involved in series OF groups of letters. ra 
involves the ability to memorise quickly, 


6. The rote memory factor (M) 
: to note visual details rapidly . 


7. Perceptual speed (P) is the ability 
therefore, that the factor-analytic research of Thurstone and Thy; 


i | c Ston 
-evealed that intelligence comprises a number of aptitudes or factors or mental abilities Ne ha. 
Originally, each of these primary mental a 


bilities was thought to be an independent 
later, the positive and significant correlations were demonstrated among these factors. The ‘ 
of correlation was from 0.13 to 0.59. This meant that the primary factors or abilities were p ange 
only factors in the operation of those mental activities which are assessed by various tests - - 
postulated that there must be some factors other than the primary factors or abilites to accoun : 
the positive correlations between those psychological tests which measure the above ine 
factors; they named these factors the second-order general factors. To this extent, they supported 
Spearman's viewpoint regarding some generality in the concept of intelligence. Thus, it may be 
concluded that intelligence is, to some extent, the expression of the combination of differen 
factors (Thurstone) and, to some exten 


t, the expression of some general factors (Spearman). 
If we pay attention to the above interpretation, it 


is clear that Thurstone acknowledged the 
existence of g factor as a higher-order factor. By this time, Spearman had also accepted the 
existence of group factors that represented special abilities. Thus, it became apparent tha 
difference between Thurstone and Spearman was largely a matter of focus or emphasis 
Spearman continued to emphasize that g was the major determinant of correlations between tes 
scores and ascribed a minor importance to group factors. Thurstone had reversed these priorities 
Vernon (1960) attempted to provide a rapprochement between Spearman's view and 
Thurstone’s view by proposing a hierarchical group factor theory. According to Vernon, §W2 the 
single factor at the top of a hierarchy that included two major group factors labelled as verba' 
educational (V : ed) and practical-mechanical-spatial-physical (K ; M). Beneath these wo major 
group factors were lying several minor group factors that resembled primary mental abilities © 
Thurstone. Specific factors were lying at the bottom of the hierarchy. | 
In 1966, Raymond Cattell and John Horn developed their own vi 
intelligence and they postulated fluid intelligence and crystallized intelligence as the two me 
neon —— intelligence. Fluid intelligence is largely nonverbal and culture-reduce? es 
ability ee by hii : understood as problem-solving and information Pre 
ald Caeser mau : ndependent of experience, Crystallized intelligence represt rif 
td t through the investment of fluid intelligence in th ulture (for examp!© ° 
trigonometry in schools). Thus, crystallized intelli See ea, 2 is p 
a oe ‘i allize intelligence is culturally dependent an aa 
fluid intelligence is apdlied earned response. Since crystallized intelligence iris 
pplied to our life experiences derived from th 


"Ue Or 


it is ObVIOUS, 


ewpoints apou! 


rimatl 
hen 


grou 
e cultural backs und thet 


expected tha eee Se | 

the tests of “ea ‘ kinds of intelligence will be correlated. It has been commonly rego” 
2006). Because bath ee Pla crystallized intelligence correlate moderately (r =*~ 0} ait! e 
symbols G; (general a | ree mental abilities are general and broad based, they wer s 
and his colleagues euana whe and G. (general crystallized ability). In subseque™ el 10" 
which are as follows: ed the number of abilities associated with the theory ine 


1. G; (fluid intelli 
f elligence) such as ability to reason in novel situation 


2. G, (crystallized i 
x | intelligence) such 
; asd : | cdl 
3. G, (quantitative ability) such ent of general knowledge a numel™ 
concepts ch as ability to comprehend and understa | 
ater 


4. G, (auditor ; 
>. Gi, ‘inliammsa aa te! as ability to discriminate sounds and detect al iter 
| processing) such as ability to see spatial relationships 4" 





—- =e 


Measurement of intelligence, Aptitude and Achievement 153 


5. \G; (Processing speed) such as alaitity te take quick decisions and maintain attention 
7, Gin Short-term memory) such as ability to use certain information over a short period of 
time 
9. G, (Long-term memory) such as ability to transfer material to permanent storage and 
retrieve it later on 
g. CDS (Correct decision speed) such as ability to arrive at correct decision soon | 


John Carroll (1993) undertook to reanalyze the data from more than 400 factor analytic 
tudies of abilities for finding out some common themes. The result was a hierarchical model 
ea lar to that of Vernon's model. One feature of this model was three-stratum explanation of 
velligence- The general intel ignece or g was at the top (stratum III) of the model; a small number 
of general-ability factors similar to those of Cattell-Horn model was at the second level (stratum | 
il); and a large number of specific factors describing the abilities required for performance on 
narrower test classification at the lowest level (stratum 1). Recently, Carroll and Horn have 
expressed the view that there lies much similarity between G, -G. theory and three-stratum theory 
and therefore, they have named the combined theory as Cattell-Horn-Carroll (CHC) theory of 
intelligence. 





Intelligence As Structure of Intellect 
Intelligence has also been defined on the basis of structural considerations of discrete factors. On 
the basis of his factor-analytic research of nearly 20 years, Guilford (1967) has proposed a 
three-dimensional boxlike model which he calls the structure of intellect model or S! model. The | 
model has tried to simplify the picture of intellectual trait relationships by organizing the traits 
along three dimensions viz., contents, operations and products. Each of these aspects of 
intelligence was analyzed and separated into subcategories: five for operations, six for products 
and five for content, making a cube of 5 x6 x5 = 150 abilities. Shortly before his death, Guilford 
(1988) expanded his theory from 150 to 180 abilities when he divided the operation of memory 
Into two categories—memory recording and memory retention (Wood & Wood 1996). Thus 6 
abilities for operation, 5 for content and 6 for products constituted 6 x5 x6 = 180 abilities. A 
description of these three Categories and subcategories is given below. 
OPERATIONS: It refers to the basic intellectual processes of thinking used by persons. It has 
the following five subcategories: 
C Cognition: It includes discovery, rediscovery, recognition of information or some 
understanding. | 
M Memory Recording: It includes a person’s ability to readily encode the information. 
Memory Retention: It includes a person's ability to retain the encoded information. 
D Divergent Production: It refers to the ability to search for multiple, creative or novel 
solutions to a problem. 
E Evaluation: It means placing a value judgement on knowledge and thought. 
N Convergent Production: It includes the ability to search for a correct solution to a 
Problem. 
CONTENT: |[t 


is. refers to a type of content or material on which operations are performed, that 


| the persons think. It has the following five subcategories: 
Visual: lt includes concrete visual materials. 

A Auditory: It includes nature and characteristics of sound perceived. 

S Symbolic: | includes letters, words, digits and other similar signs. 

Semantic: It includes verbal meanings or ideas. 


B 
Behavioural: It includes knowledge regarding other persons. 


a 











154 Tests, Measurements and Research Methods in Behavioural Sciences 


PRODUCTS: It refers to the results of performing operations on con 


; ; te 
thought produced by individuals. It has the following six subcategories: Nts, that jg 


Me fom 


U Units: It refers to the production of a single word, definition or iso| ated bit of 
C Classes: It refers to the production of a concept. ° format 


R Relations: It refers to the production of any form of relationshi 


: ve P such as 
opposite or other similar ones. *€N analogy . 


S Systems: It refers to the production of an internally consistent set of claccis 
various forms. SMiCations « 


T Transformation: It refers to the production of changes in meanin 


§, OF§anization 
other arrangements. ™ OF Some 


| Implications: It refers to the production of such information as 


is beyond ' 
presented. YONG the dai 


In the Guilford model each cell in the cube should be reasonable, and over 100 such factor 
have been identified so far. In this model each cell in the cube is identified by a three-letter code 
(cf. Figure 9.1) that refers to an operation, a content and a product in that order, In Figure 0 
each of the four corner cells has been separated and shown by a separate cube to facilitate 
understanding. For example, CVU is one such cube on the upper left-hand side and itis cognition 
of visual units. Likewise, three other cubes are EBU (Evaluation of Behavioural Units), 8 
(Evaluation of Behavioural Implications) and CBI (Cognition of Behavioural Implication), Let 


illustrate the use of the model with an example. Suppose a little child is asked to locate the dayol 


the week on a particular date in a calendar. This task obviously involves operations like Memory 
(M), Cognition (C) and Convergent thinking (N). In carrying out these operations he will be 
making use of the contents like Semantics (M), that is, reading and understanding the print 
words and figures that may indicate the date and day of the particular month. By carrying ou 
these mental operations with the help of contents, the child will arrive at the products. The day 0! 
the week to which the date in question refers represents the factor known as Relations (R). 







wenhioeiahs 
Visual CONTENT 
Auditory Semantic 
M Behavioural 
PRODUCT - 
Units 
Classes 
Relations 
Systems i 
Transformations CO ERATION 
eee 2 | Evaluation 
| Divergent production 
+h Convergent production 
etention 
CBI Memory r | 
Memory recording 
Cognition 
Fig.9.1  Guilford’s structure of intellect Cl yf 
es tt 


Although one test ha 
of Intellect learning Abili 
iMpact upon developme 


| ly, 
s been directly designed on this Guilford’s model, ee ‘real ha? 
Hes test (M Meeker, R Meeker & Roid 1985) the ™ 
nt and application of tests for general use. 








> | 


Measurement of Intelligence, Aptitude and Achievement 155 


e As Learning Ability 


* nc 
ell ge : + ; { | Le . 
melligen® has also been defined as the ability to learn. In this sense, an individual's intelligence 


‘iter of the degree to which he or she is educable. If a person is able to learn something 
aly and quickly, he is said to be an intelligent person. Piaget (1950, 1952) defined 
igence on the basis e assimilation and accommodation, which together determine a 
bility to learn. Assimilation denotes changes occurring in what has entered into the 
rind from the external world in such a way that they fit into a frame of reference, and 
accommodation denotes changes ennai. In the internal structure of a person as a function of 
new experiences. if this definition of intel ligence is taken as a base, major changes in the modern 
approach to intelligence testing would be required because the modern approach emphasizes 
most what a child has learned whereas this definition emphasizes what a child can learn as a 
esl of proper instruction. Feuerstein (1975) has developed a procedure in which a child’s 
learning ability is measu red after he is properly instructed for sometime. 


intelligence As a Two-level Process 


intelligence has also been defined on the bais of a combination of two levels of processes. At one 
level is found associative intelligence and at another abstract intelligence. Jensen (1968) has 
defined intelligence from this viewpoint. Associative intelligence includes those kinds of tests 
hat depend upon memory as well as simple verbal associations. Such intelligence may include 
factors like verbal associations, memory for temporal sequences, spatial positions, and 
stimulus-response learning. Abstract intelligence, on the other hand, includes factors like 
concept learning, thinking, problem-solving skills, multiple discrimination, principle learning, 
etc, (Gagne 1965; Freeman 1962). Most intelligence tests are a combination of some factors from 
associative intelligence and some from abstract intelligence. Associative intelligence relates to 
biological maturation and shows little variation with different social classes and races. Abstract 
of cognitive intelligence is dependent upon education and culture and is, therefore, responsible 
for the observed differences in the abilities among different social classes and races. 


Intelligence As a Global Capacity 


Intelligence has been defined as a global capacity or a composite of several intellectual skills. 
Wechsler (1944) and Stoddard (1943) have defined intelligence as such. According to Wechsler, 
intelligence is “the aggregate or global capacity of the individual to act purposefully, to think 
rationally, and to deal effectively with his environment”. One of the salient features of this 
definition is that intelligence is displayed by the behaviour of the individual as a whole and that 
intelligent behaviour is goal-directed and helps in making effective adjustment in the given 
environment. Wechsler, thus, has included the concept of “drive” and “incentive” which are 
Implied in his statements “to act purposefully” and “to deal effectively”. But his viewpoint has 
been criticised by many experts who are of the view that drive and incentives are nonintellectual 
traits of personality and if they are included in a test of mental ability, it will create more 
contusion (Freeman 1962). Similarly, Stoddard has defined intelligence as “the ability to 
Raeitake activities that are characterized by (i) difficulty (ii) complexity (iii) abstractness 
') economy (v) adaptiveness to goal (vi) social value and (vii) the emergence of originals and to 
Maintain such activities under conditions that demand a concentration of energy and resistance 
° €motional forces.” Of these seven characteristics three skills, namely, economy, social value 
Me emergence of originals, deserve special attention because these attributes have not been 
SCUssed so far in any approach. Economy indicates the rate al which a mental task is done or a 
aie is solved. If A solves a problem sooner than B, A would be regarded as more able bas ° 
a be said to have the characteristic of economy. The ti ee aia wea 
sé. “t OF not the mental tasks performed by a person are in accor vel - by cue. th 
Plable norms. If they are, the task is said to have the characteristic of social value. Ihe 


> 








156 Tests, Measurements and Research Methods in Behavioural Sciences 


inclusion of social value in the definition of intelligence by Stoddard h 
controversy and confusion. Psychologists, In general, are of the opinion that 1OUuseg . 
subjective element and therefore, it is difficult to judge whether or not 4 given. SOcia| Value ' 
socially desirable or undesirable. As such, such an attribute should - bental g te ha 
scientific definition of intelligence nor in any intelligence test. The emergence INCluded «Nis 
to a person’s ability to discover something new and different, that j | 

creativity and originality. A few examples of such ability are to discover a NEW relatione Yo 
new scientific principle from the given data and to discover a new fact or theo © ationshj a 
psychological variables or social facts. Stoddard’s definition has, Kimeaet. ig Cen 
because of this inclusion of the two conditions of intelligent behaviour, that is Poi, Critic} 
energy and resistance to emotional forces. It is said that motivational and mie Mration of 
nonintellectual traits and should not be treated as aspects of mental abilities nor tg are 
included in any intelligence test. If this is done, it would be to confuse and inwalidate m they | 


the intelligence. MEASUrEs 


Intelligence As a Process of Integrating Physiology and Information Processin 
Das, Naglieri and Kirby (1994) explained intelligence as a process of integrating physiolo 
information processing. Their model is known as PASS theory (or model) of intelligence ii 
basis of neurophysiological model of brain proposed by A R Luria, these experts have Helse 
intellectual performance into four basic processes: Planning (P), Attention (A), Simultaneoys 
scanning (S) and Successive processing (S), popularly synthesized as PASS model, The model 
relates these four processes to specific neurological structures or areas, which Luria called 
functional units (Das & Naglieri 2001). Luria described the cognitive processes within the 
framework of three functional units. The function of the first unit is cortical arousal and attention; 
the second unit codes information using simultaneous and successive processes; and third unitis 
responsible for providing, self-monitoring and structuring of cognitive activities. This work of 
Luria on functional aspects of the brain structures became the basis of PASS model. 


6 


The most important and basic function is attention. A person must attend to a stimulus in 
order to process the information. Any inability to focus attention on the stimuli is seen as one 
primary reason of the poor intellectual performance. Once the attention is focused on the stimuli 
the resulting information may require either simultaneous focusing or successive focusing 
Information that is subject to simultaneous scanning is said to be surveyable because element 
are interrelated and accessible. Simultaneous processing is involved in language comprehensior 
and other such tasks that require the perception of whole stimulus situation at once. aici 
processing is required whenever the elements of the task have to be completed in a pate 
order. Tasks like serial recalling of stimuli and performing some skilled movements requ 
successive processing, . d wha 

The planning part of the model is involved in deciding where to focus attention an at 
type of processing is required by the given task. This portion is very similar to ae 
metacomponents. This planning process also monitors the success of problem SOM 
needed, modifies the approach until a solution is arrived at. These four processes, In avid 
within an existing knowledge base, which is thought to be the cumulative result of an" 
experiences stored in the memory. Since all four processes operate within —— 


| | * - ie | a 
knowledge base, the base of information tends to influence all cognitive processes | 

& Kirby 1994). : ot 

‘ Pi : ae - 3 a we 

In conclusion, it can be said that the term ‘intelligence’ is difficult to define in endl 


which is acceptable to all. However, intelligence can be understood as 4 general . the 
traits, which is often reported as mental abilities. Often, intelligence tests are ~— ol en 
of creativity which measure a similar but different aspect of the mental pr cess. voit nea” 
ability or intelligence tests measure convergent thinking whereas tests of creall 


_ 





Measurement of Intel ligence, Aptitude and Achievement 157 

+. divergent thinking. Convergent thinking is one where 
prima’ bel of information to one or two possible 
jarge he individual tends to expand from a limited 
hinkiNB: +h Torrance tests of Creativity and Remote 
oe stivity which have been discussed a little | 
pest oF CFFS 


ops OF INTELLIGENCE TESTS 
TY 


tests have been classified from different angles, From the 


the individual tends to reduce a 
Correct: answers whereas in divergent 
amount of information to many correct 
Associate tests are two common 


>ciat examples of 
ater in this chapter. 


ralligenc® vided | ; point of view of their 
M ainistration they are dividec Mo two convenient categories— individual tests and group tests. 
individual intelligence fest, as IfS name implies, Is one Which can be administered to one 
An 


an ata time. The first individual intelligence test 
ligence test is one which can be administered to more than one person at a time, that is, it 
: ‘ be administered to a group. The first group intelligence test was developed when the 
co for mass testing was realised during World War I. The Army Alpha test and the Army 
sa tes were the first group tests, the former being a verbal one and the latter being a 
nonlanguage one. Besides the above simple Comparison, the individual test and the group test 
may be compared on the following points so that the two may be recognized as two distinct tests: 

| 1. The individual test requires a highly skilled and experienced test administrator. He must 
havea specialized knowledge of testing procedures and test administrations. The group test, on 
he other hand, being mostly self-explanatory in nature, does not require a well-trained and 
skilled administrator. Persons having a moderate experience with testing procedures can do well 
‘a the case of administration of the group test. 


was the Binet-Simon scale. A group 


2. Individual tests are mostly applied in clinical settings whereas group tests are mostly 
applied in educational settings, industries, the civil services and military services. A person who 
is discouraged, withdrawn and has a strong sense of inferiority and guilt cannot be tested 
properly in a group. However, in an individual session with an experienced adminstrator he can 
be readily diagnosed with the help of the appropriate test. 

3. Individual tests are most suitable for very young children whereas group tests are most 
suitable for adolescents and adults. With very young children, that is, children of preschool ages, 
only individual tests can be administered. No group test can be successfully administered with 
them. There are two reasons for this. First, the children of preschool age are distracted very easily. 
i'they are asked to sit in a group, they will be even more distracted. The only solution is to test 
them individually where the test administrator can have a better control over the distraction of a 
child. Second, such children cannot be motivated to do well in group tests. The administrator 
“annot encourage and motivate each child of a group to the extent he deserves. In an individual 
*€ssion, the same task becomes easier for him. 


4. Individual tests are usually more difficult to construct than group tests. The construction 


ot an individual test is a time-consuming process and an expensive job. Preparing test items for 
‘ndividual tests, standardizing those items on a suitable representative sample, and then finding 
Ne proper instructions, time limits, method of scoring are definitely arduous, noneconomical 
a 'ime-consuming processes. This is specially so because the entire process of test construction 
* 10 be carried out in an individual session. It is, therefore, advisable that individual test 
“onstruction should not be undertaken unless the test constructor is ready to expend a 
“siderable amount of money and time in test construction. 
Motivate ee tests, the administrator has a very little Spporurty © es i = 
Sin € examinees and this is especially true when the group is large. Not only this, in ne 
he administrator cannot readily recognize those examinees who are being influenced 
lransient conditions like headache, fatigue, worry, etc. In individual tests, the 


by 


UD Lest | 
SOme 





z 





~~ 


158 ests, Measurements and Research Methods in Behavioural Sciences 


administrator has a better chance to detect the influence of transient condit 
sa lOns |, 


examinees, and he is also better able to establish a rapport and obtain ¢ 


examinee. 
6. The norms of group tests are more dependable and established than 
‘ndividual tests. Because of ease in data collection, the norms of group tests can Kel * Norms 
very large sample but in case of the individual tests, norms cannot be based aa ase y Ns 
sample because of the difficulty and the time-consuming nature of data collection a Very lage 
Intelligence tests have also been classified on the basis of the nature of items yea. 
The verbal test (or paper-and-pencil test) and the performance test are the natural 7 ID tests 
such a Classification. Strictly speaking, a verbal test is one in which the instructions ri of 
reproduced usually through the written language before the examinees. It automatically 
then, that for a verbal test the examinee must be literate because he will have to use a a 
paper for answering the items. Such tests are also called paper-and-pencil tests of telien, and 
laymen. All verbal tests by their nature are group fests. The Army Alpha test and the specs y 
Classification test are examples of verbal tests. The Mohsin General Intelligence test the a 
General Mental Ability test and the Jalota’s Group General Mental Ability tests are ° 
examples of verbal tests developed by Indian psychologists. me 
A performance test is one where language is used only to impart instruction and items 3: 
manipulative in nature. The examinee is required to answer by manipulating the given tes 
materials. No written language is needed to answer the items. Thus a performance test is not 
paper-and-pencil test. It can be administered to literates as well as illiterates, speech defectives 
preschool children who cannot use language, and on the culturally deprived, etc, Aj 
performance tests by their nature are individ ual tests. Examples of performance tests are the Am 
Beta test, the Pinter—Patterson scale, the Cornell-Coxe scale, the Arthur Point scale the 
Pass-along test, and the Kohs Block Design test. In addition to the verbal test and the performance 
test, there is another important class of intelligence test commonly known as the nonverbal test or 
the nonlanguage test or the culure-free test. By definition, a nonverbal test or a nonlanguage tes 
is one where no language is used at all either in instruction or in construction of items. Usually, 
the instruction is given through pantomime, gestures, blackboard demonstrations and charts and 
the items are neither verbal nor manipulative types. Test items are usually of figured relation type 
where the examinee is to discover, without the actual manipulation of objects, the relationship 
between various figures and designs. In a nonlanguage test, the use of paper and pencil § 
involved. This makes it similar to a verbal test and distinct from a performance test. The examines 
uses the paper and pencil only to underline or cross out the items, which do not require 2”) 
ability to read or write. The use of paper and pencil in a nonlanguage test !s limited to makins 
only nonlinguistic remarks. Nonlanguage tests are usually group tests and are often given ins 
the examinees are incapable of being tested with the usual tests of intelligence. pe 
tests are most suitable for subnormals, physically handicapped, foreign-language eee 
One of the biggest advantages of a nonlanguage test is that it can be administered sore 
belonging to different cultures. Hence, it is also sometimes known as culture-free OF me 
or cross-cultural or culture-reduced test of intelligence. The first nonlanguage test was Ger pela 
during World War | for illiterate soldiers. The name of the test was the Army examina jon 
Some of the other tests which are regarded as culture-free tests are the Leiter Inter ressivt 
Performance scale, the Culture Fair intelligence test by Cattell, and the Raven ne ris 
Matrices test and the Goodenough Draw-a-Man test, now known as Goodenot’e atte! of 
Drawing test. But strictly speaking, these tests are not free from cultural influences- 5 ae tert 
fact, no one has fully succeeded in constructing a culture-free test in the true sense : rien 
The only thing that can be said in favour of these tests is that they share the maximum” - be 
common to a large number of cultures. Hence, the more appropriate name lor 
Culture-Common or Culture-General tests rather than culture-free tests. 


, _ MOn y 
POPeration thor : 
aT } * 





> 





Measurement of Intelligence, Aptitude and Achievement 159 


jetailed description of some very important individual intelligence scales is 
e 


WwW. 
sented belo 
VI ir 


d-Binet Scale 


gtanfore ortant step in the development of the individual intelligence was taken by Alfred Binet 


avery 1911) who established the first psychological laboratory in 1889 in France. He was asked 
(189/~ Government of France to devise a means for diagnosing slow learners and mentally 
by rs children in Paris schools. Working with Simon, a physician at the asylum at Saint-Yon, 
retarded eloped an intelligence test for this purpose. The test is known as the Binet-Simon scale. 
pine! ies published in 1905. The Binet-Simon scale consisted of 30 items, which were 
and I “ ascending order of difficulty. The scale was a crude measure of intelligence of 
arent ging children. In 1908, Binet and Simon revised the scale in order to remove some of its 
ae 8 The 1908 Binet-Simon scale was the first age scale and it created considerable interest 
defects: psychologists working in different countries like Germany, England, Belgium, 
ind Italy, and the United States. As a result, many suggestions and criticisms were made 
a the light of these suggestions and criticisms, Binet and Simon further revised the scale in 
“a in which the age range was extended from three years to the adult level. This was the last 
revision by Binet in his lifetime. 
in the United States of America, several adaptations and translations of the Binet-Simon 
scale were developed. The most important of these adaptations, which are today very popular, 
were done at Stanford University. There were, however, four other revisions in the United States 
of America. In 1908, H H Goddard published an English translation of the 1905 Binet-Simon 
scale, and in 1911 he published a revised version of the 1908 Binet-Simon scale. Other revisions 
in the USA were made by F Kuhlman in 1912, and 1939; R Yerkes in 1915 and 1923; and | P 
Herring in 1922. But these four revisions are remembered today only for their historical interest. 
The first important American revision of the Binet-Simon scale, which gained prominence, was 
done by Terman and his associates at Stanford University in 1916. In this revision, the 
Binet-Simon scale was given almost a new look. More than one third of the items were new and 
the entire scale was restandardized on an American sample of 1400 in which 1000 were children 
and 400 were adults. The most important aspect of this revision was the concept of 1Q, which 
was for the first time introduced in a psychological test. In 1937, Terman along with Merill revised 
the 1916 Stanford-Binet test which is commonly referred to as the 1937 Binet. This revision 
comprised two equivalent forms—L and M. The 1937 Binet was better standardized and better 
validated than the 1916 revision and it also extended the upper and lower range. The third 
vision of the Stanford-Binet scale was done in 1960. This revision consists of a single form in 
Which the test items from both L and M forms were retained. Hence, the 1960 Binet is known as 
he L-M form. Generally, the 1960 revision has retained the important characteristics of the 
ih However, it has dropped the unsatisfactory subsets of the 1937 revision, and has 
aha i a few items. The 1960 revision has a range of 2 years of mental age scores to 22 years 
me months of mental age score. The single L-M form is NOW available with norms on data as 
* eles According to Sattler & Theye (1967), this form measures abilities from seven 
an Shue anguage, reasoning, memory, social intelligence, conceptual, numerical reasoning 
Ya : “Motor. Test items are in the form of words, objects and pictures and the response given 
iui are in the form of drawing, calculating, writing, and speaking. About an hour is 
Cora are. 1? 2dministration. In this revision, the intelligence is expressed in terms of standard 
© of the deviation 


IQ. 


to bh ? Next stage was the 1972 restandardization of Form L-M. The 1960s revision had failed 
stands ude @ new normative sample or restandardization. However, by 1972, a new 
Stanton tion Soup consisting of a representative sample of 2100 children (about 100 at each 
One f ~Binet age level) had been obtained for use with the 1960 revision (Thorndike 1973). 


ISIC feature of the 1972 norms was that unlike all previous norms, it included nonwhites. 














160 = Jesis, Measurements and Research Methods in Behavioural sciences 
Ihe new edition, that is, fourth edition of the Stanford-Binet (SB4) a 
edition, in fact, represents its most extensive revision (Delaney & Ho 
Hagen & Sattler 1986). This revision, on the one hand, retained the chief 
editions as individually administered tests and, on the other h 


Par ny 
Dkings 1987. Th 
AdVantapes of Ml 


os 


and, it also refle 
developments in both theoretical conceptualizations of intellectual functio 


of test construction, 


Ctec the; len, 


This modern Binet scale incorporates the gf-gc theory of intelligence (Horn g \ 
According to this theory, there are two basic types of intelligence: fluid (gf) and Wei 199), 
Fluid intelligence can best be thought of as those abilities that allow the Person to reg ed f 
and acquire new knowledge. Crystallized intelligence, on the other hand, dj SON, thiny 
knowledge and understanding that the person has acquired. The modern Binet 


— SCale Is bh ’ 
upon three levels of hierarchical model as shown in Figure 9.2. vse 


Y 


-———___— 


Crystallized abilities Fluid-analytic abilities 


—— 


Verbal Quantitative Abstract/visual 
reasoning reasoning reasoning 


Short-term memory 


Fig. 9.2 Three-level model of Binet’s modern scale 


As iCis obvious from Figure 9.2, at the top of model is the g-factor (or general intelligence | 


which, in fact, reflects the common variability of all tasks, At the next level of the model arethr 
group factors—Crystallized abilities reflecting learning, Fluid-analytic abilities reflecting the 
original potential that a person uses to acquire crystallized abilities (Horn 1994; Horn & Cattl 
1966) and Short-term memory reflecting a persons’s capabilities to retain information briefly af 
a single brief presentation, Crystallized abilities are further divided into two subcategories— 
verbal reasoning and quantitative reasoning. Fluid-analysis reasoning is reflected 
abstract/visual reasoning. The component of verbal reasoning is assessed by four - 
quantitative reasoning is assessed by three tests, abstract/visual reasoning is assessed by tour 


; . , istsot ld 
and short-term memory is assessed by four tests. In this way, Binet’s modern scale consists 


i: 
te 
it 


subtests as shown in Figure 9.3, 


inet’ ' es | ts, Thus 
Binet’s modern scale permits the calculation of specific scores for each of the 15 tes 


in addition to an overall score that reflects g, the test users can obtain scores related 
specific content area. In order to avoid having wider content unevenly distributed gas 
groups, the authors dropped the age scale format in this version. Now in place of age _ le,2 
having the same content were placed together into any one of 15 separate tests. For a , 
matrix items were placed together in one test al 


and all vocabulary items were aie ir corte 
another test. In addition, each of the specific 15 tests can be grouped into one of fo | 
areas. This scale measures 


Due tots: 
items in a given test will 


intelligence in individuals aged 2 years through adult. 
not be appropriate for all subjects. Therefore, in adm 


4 19be 
inisteriNb © 





The tes 
Mee TAD ( 
Leas Wi 
Hatem 
"HOES gf | 
ste 


Wipe 
i} |} Ee 
fel, 


Mee, 
n ful 


a My Cai 
ty 


Utley 





> 


Measurement of Intelligence, Aptitude and Achievement 161 


Vocabulary test 
Comprehension test 
Absurdities test 


Verbal relations test 


4. Verbal resoning 


Quantitative test 
2. Quantitative reasoning Number series test 


Equation building test 


Pattern analysis test 


3, Abstract/visual reasoning Copying test 


Matrices test 





Paper-folding-and- 
cutting test 


— Bead memory test 


4. Shot-term memory aa Memory for sentences test 
Memory for digits test 
Memory for objects test 


Fig.9.5 Four major areas of Binet’s modern scale and the related tests 


The test users must also establish the basal age for each test, that is, the lowest level point 
where two consecutive items of approximately equal difficulty are passed. In this way testing 
proceeds within each test until a ceiling is reached, that is, point at which at least three out of four 
items are missed. The concept of age differentiation has been retained, that is, items are presented 
In order of increasing difficulty according to their ability to discriminate different age groups. 


Thus the age differentiation refers to the fact that with increasing age, children develop their 
abilities, 


Binet’s modern scale retains the use of standard scores. Accordingly, raw scores on each of 
|S tests can be converted to a standard age score With a mean of 50 and a standard deviation of 8. 
The subgrouping of individual tests into four content areas permits one to calculate four 
*€a-content scores, each with a mean of 100 and standard deviation of 16. 


Like any other individually administered test, the 1986 revision of Stanford-Binet is a good 
N-asure of general intelligence. Its norm was developed on adequate standardization sample. 
tation and scoring instructions are clear and the test discriminates well among normal, it 
sited and neurologically impaired samples. | 

However, 
‘tructure of th 
Nterpretation, 
“"Banization ac 
“Vel with you 
‘OMeWhat mo 
*dministereq 





there are some limitations. First, there exists disagreement about the factor 
€ instrument and this ultimately raises a question about the appropriate 
It appears that the factor structure shifts from a two-factor to a four-factor 
the sample advances in age. Second, it is often a difficult task to establish a basal 
nger, intellectually impaired children. Finally, the administration of the test is 
re difficult than other tests since the examiner has to score each item as the test is 





_—e 








ind Research Methods in Behavioural §¢ lences 
sqgenrements : 

Tests, Measwren 

162 = Jesh. 


dition is the fifth edition of Stanford-Binet test (SB5) and ity. 
The a a . tems of this new scale were reviewed for Bender, ethni, “ | 
2003 by Roid. Tre ee asses, The fifth edition is organized in a little differen.” ul | 
and socio-e¢ — : Ball scale IQ, Verbal IQ and Nonverbal ¢ 2 and five OY Kron het 
“—,) nail change in the fifth edition is that the standard de e 
Se 5 as opposed to 16 which was the tradition with « 


Mtoy. 
Ural 


Viation lor the. 
arlier VETSiogn Phe 
dt eac h oft 
Nverbal The 
dtial PrOces sin z 


Otte lr fet 
Al 

Cre ee 

. OF thie lene 


e Fives ¢ 
\ iar. 


fiy 


Lh lath. 


The organization of SB5 test has th a 9 a sr th 
intelligence may be assessed by two distinct saci er va ana No 
fluid reasoning, knowledge, quantitative reasoning, visual-sp ‘ath 
asaoiy: Wien these tive factors aie crossed over two domains (Verbal and nonverke Mt 
is an instrument with ten subtests like nonverbal fluid reasoning, verb, I fu | the. 
nonverbal knowledge, verbal knowledge, nonverbal quantitative reasoning, verbal ¢ a : 
reasoning, nonverbal visual-spatial processing, verbal visual-spatial Processing = " 
working memory and verbal working memory. Thus SB5 provides ten sy btest ives Hiss a 
and SD of 3), three IQ scores (full scale lO, verbal scale IQ and nonverbal SCale IQ) a. ie ’ 
factor scores (fluid reasoning, knowledge, quantitative reasoning, Visual-spatia| hi 
working memory). The IQ and factor scores have norms based upon SING spy 

The basic features of SB5 may be enlisted as under: 


a mean of 100 andsy” 

1. The developers of SB5 have screened items for fairness based ON Variables like cn. 
race, ethnicity, disability and religious tradition (Christian, Jewish, Muslim, Hindu and Buddh 
backgrounds). In fact, this is for the first time in the history of intelligence testing that relizi. 
tradition has been considered. | 


2. SB5 is suitable for assessin 


g intelligence for children aged 2 through 
and older. 


adults ageg 3 

3. SB5 uses a routing procedure for estimating the general cognitive ability of the evar = 
before proceeding towards the remainder of the test. The aim of routing procedure is to iden 
the appropriate starting points for later subtests. The routing items belong to both verte! =: 


nonverbal domains. These items also provide what is called Abbreviated IQ whic = 
sometimes used for screening purposes. 


4. The working memory factor, 
assessing and understanding childre 


In sum, it can be said that SB5 | 
of the cognitive spectrum—the very 
of individual intelligence testing. 
SB5 and its predecessors have be 


which consists of both verbal and nonverbal items hes’ 
n with attention-deficit/hyperactivity disorder. 

5 avery promising intelligence test that is useful at ne 
young and the very gifted person. It is likely to be ban 
In Table 9.1, some important milestones in the develoom 
€n presented for better sequential understanding. 

Table 9.1 Milestones j 


in the development of SB5 and its predecessors 
ay a 


Test/Authors 


Basic feature 
1905 


} | att +n 
Binet and Simon Simple 30-item test aan 
ascending order ol difficu 


| eid 
nee oe and Simon Mental age concept introduc 
1911 Binet and Simon Extended to cover adults , 
“1 introduce! 
1916 Stanford-Binet (Terman and Merrill) The concept of IQ inles 
1937 


; . i tornms (L an 


- 


stanford—Binet-3 (Terman and Merrill) © Modern item analysis methods 


Measurement of Intelligence, Aptitude and Achievement 163 


1960 


used 
stanford—Binet-3 (Thorndike) SB-3 restanderdized on 2,100 
1972 persons 
me stanford—Binet-4 (Thorndike, Hagen Complete restructuring into 15 
19? and Sattler) subtests 
03 Stanford—Binet-5 (Roid) Five factors of Intelligence 
20 introduced 





wechsler Scales son fo . _ | — 
to remove the difficulties and shortcomings of the Stanford-Binet scale, particularly the 

in order _ regarding the measurement of adult intelligence, David Wechsler, working at the 
dice pital at New York city, developed a new scale for measuring adult intelligence 
BE ThE scale was known as the Wechs/er—Bellevue scale to be used with children above 10 
ane The Wechsler—Bellevue scale was developed in two equivalent forms—Form | and 
eS l Each form consisted of 10 subtests of which five were verbal tests and five were 
Oe or performance tests. In 1955 the Wechsler—Bellevue scale was revised and renamed 
is Wechsler Adult Intelligence Scale or WAIS, which consisted of 11 subtests, of which six 
were verbal scales and five were nonverbal or performance scales. It was called WAIS-R after a 
revision done in 1981 and WAIS-II| after revision in 1997. 

The WAIS-III consists of seven verbal subtests and seven performance subtests. The seven 
standard verbal subtests and the functions they intend to measure are presented in Table 9.2 


Table 9,2 WAIS IIL: Verbal subtests and their functions 


Names of the verbal subtests | Major function measured 
1. Vocabulary test Vocabulary level 
2. Similarities test Abstract thinking 
3. Arithmetic test Concentration 
4. Digit span test Immediate memory, anxiety 
5. Information test Range ot knowledge 
6. Comprehension Judgement 
7. Letter-Number sequencing test Freedom from distractiblity 


The seven standard performance subtests and the functions they intend to measure are 
Presented in Table 9.3. 


Table 9.3 WAIS Ill : Performance subtests and their functions 


Ss ee se eee 


Names of the performance subtests Major functions measured 


|. Picture-completion test Alertness to details 

2. Digit Symbol-coding test Visual-motor functioning 

3. Block design test | Nonverbal reasoning 

4. Matrix reasoning test Inductive reasoning 

3. Picture arrangement test Planning ability 

6. Symbol search test Information-processing speed 
’. Object-Assembly test Analysis of part-whole 


relationship 











64 = Test. Measurements and Research Methods in Behavioural Sciences 
1 ests, « : 


The seven verbal subtests together make up the verbal scale and the —_— 
r make up the performance scale. Each subtest produces a ioe N pe Otay 
making a comparison of raw scores on the individual — at . 
an be converted to standard or scaled scores with a mean of 10 and stants tests in 
have been derived for this conversion: ‘itadhinnen devia on 
Age-adjusted norms were based upon a sn ag NOrms ‘i 
-y distribution of raw scores for each age group, and reference-group norm CUMUlatig 
ance of participants In the standardization sample between 5 ie MET base 
1997). To obtain the verbal IQ (VIQ), the agesconetied « and 3, 
ed scores 


together. The verbal IQ is a deviation IQ with a mean of 


subtests togethe 
total number of points. For 
scores ¢ 
3. Two sets of norms 
reference-group NOTMS. 
frequent 
on the perform 
(Tussky, Zhu & Ledbetter 
verbal subtests are added 


standard deviation of 15. 

Like verbal subtests, the raw scores for each of the seven performance subtests are cop, 
to scaled scores with a mean of 10 and standard deviation of 3. The performance | om 
age-corrected scaled scores on the performance subtests, Like ah 


obtained by summing the a 
IQ, performance IQ is also deviation 1Q with a mean of 100 and standard deviation of 15 


in addition to verbal IQ and performance IQ, there is a provision for Full scale IQ (FSi9) ». 
obtained by adding the age-corrected scaled scores of verbal subtests with the perfoiiaag 
subtests and comparing the subject to the standardization sample. 

in addition to the verbal IQ, performance IQ and full scale IQ, the WAIS-Iil provice 
ting index scores. In this scale, there are four index scores: yer: 
comprehension index, perceptual organization index, working memory index and processr: 
speed index. A schematic overview of WAIS-III has been presented in Table 9.4. Four index score 
tend to assess different aspects of human intelligence. 

The verbal comprehension index measures crystallized intelligence. The perceot 
organization index measures fluid intelligence. The working memory index measures abil’ © 
hold manipulations and information in our mind. In fact, the concept of working memory 
to the information that the person actively holds in mind, in contrast to our stored knowlecs 
long-term memory. Processing speed index measures how quickly the mind of the person 0" 
For example, one person may solve the given problem in 10 seconds whereas another pe 
may solve the same problem in 50 seconds. 

Table 9.4 A Schematic overview of WAIS-III index scores 


Vocabulary test 
1. Verbal Comprehension Index << simitarties test 
Information test 


3 Picture-completion test 
2. Perceptual organization Index—<—— Block Design test 
~~ Matrix Reasoning test 


Arithmetic test 
3. Working memory Index iit span test _ 
Letter-Numbering sequencilis”” 
Digit Symbol-Coding test 
4. Processing speed Index ae 


The separate s a a5 | 
analweis on irene subtest scores of WAIS-III also provides opportunity for pa 
et le Investigator evaluates relatively large differences 

ple, a mental disorder like schizophrenia involves poor 


ot the 


10g and ; 


opportunity for compu 






Symbol search test 





-. [Ay 


al haa 4 
trern al core ) 


stscar mee 





between suble>” 
concentralle 





> TN 


Measurement of intelligence, Aptitude and Achievement 165 


ment. AS a consequence, such a person would tend to score low on arithmetic test and 
juage ehension test. In fact, Wechsler (1958) had provided a host 
r 


compr” as diagnostically significant. 


osed 
. ade 4s the psychometric properties of WAIS-III are concerned, it can be used for assessing 
_elligence of persons aged 16 to 85~89 (Tussky, Zhu & Ledbetter 1997). WAIS-III has impressive 
i bility and validity when split-half method was used, the average coefficients across age 
ee were .98 for full scale IQ, .97 for the verbal IQ and .94 for the performance IQ. Test-retest 
ee cients were only slightly lower, that is, .94, .94 and .88 for FSIQ, VIQ and PIQ respectively. 
oe validity coefficient of wel in terms of correlation with other tests, particularly with earlier 
WAIS-R and with the children’s version (that is, WISC-IIH), were high for both individual tests as 
well as for verbal IQ, performance IQ and full scale IQ. It ranged from .45 to .90. 
WAIS-III has some advantages. Its major advantages are as under: 


| 
of pattern analysis tentatively 


(i) It incorporates the modern multidimensional nature of human intelligence, including 
fluid intelligence and processing speed. 


(ji) It incorporates the possibility of pattern analysis. 
(iii) It is appropriate and most suitable for assessing adult human intelligence. 
(iv) It uses deviation IQ. 
(vy) It has impressive degree of reliability and validity. 
(vi) It uses a point scale. 
(vii) Itincludes performance scale. 


(viii) It makes provision for index score which provides a support to multidimensional nature 
of human intelligence. 


However, WAIS-III also has some disadvantages as under: 


(i) It is a poor measure of extreme levels, that is, high or low level of intelligence. 
(ii) It does not take into consideration the theories of multiple intelligence as enunciated by 
Gardner (1983). 

(iii) It has poor reliability for the individual subtests. 

The latest version of the test is WAIS-IV that was released in 2008. This version is composed 
of ten core subtests and five supplemental subtests. The scores on the 10 core subtests constitute 
the Full Scale 1Q. In WAIS-IV, the verbal / performance subscales from the previous versions have 
been deleted and replaced by four index scores as under: 

i) Verbal Comprehension Index (VC1) 

lii) Perceptual Reasoning Index (PRI) 

(iii) Working Memory Index (WMI) 

(iv) Processing Speed Index (PSI) 

7 In WAIS-IV the General Ability Index (GAI) has been included and it consists of the 
‘iMilarities test, vocabulary test and information test from Verbal Comprehension Index and the 
a Ten Test, Matrix Reasoning Test and Visual Puzzles Tests from the Perceptual heascning 
Wiseman GAI is a good measure of cognitive abilities and is less affected ‘i the ok sto of 
VA. ie ey working memory. The WAIS-IV is appropriate for use _ persons age phe 
general in am: two broad scores are also generated that — = u oe the 
of VC nn — abilities. One is Full Scale IQ (FSIQ) based oe ‘s “ * —— 
Subtest . , WMI and PSI and another is General Ability Index (G )t -— Is Dased only on the six 

at VCl and PRI consists of. The subtests of WAIS-IV are given tn Table 9.5. 





a 








166 Tests, Measurements and Research Methods in Behavioural Sciences 


Table 9.5 Subtests of WAIS-IV 








Working memory i 













Block Design (X) [+ Digit Span (x “rong 
« Similarities (\) \* ia Design (X ‘ come (X) : Symbor 
* Vocabulary (X) » Visual Puzzles (X) * Arithmetic (xX) Coding (y 


* Letter-Number 
Sequence (Y) 


« Information (X) Matrix Reasoning (X) 


* (ary 
Cane ONating 





» Picture Completion (Y) 
(Y) 


+ Comprehension (Y) 





hts 


I+ Figure Wei 
X = Core test; Y = Supplemental test 


Downward Extension of the WAIS-III: The WISC-III and the WPPSI-R 


Wechsler made downward extension of WAIS-III so that it could be applied to the ASSESsIp 
intelligence of children. For this purpose, Wechsler developed an intelligence scale fo, i ! 
which was first published in 1949, revised in 1974 and most recently revised jn 199] ia 
known as Wechsler Intelligence Scale for Children-Third Edition (WISC-III). j fina 
intelligence from ages 6 through 16 years, 11 months and 30 days. Like WAIS-\II WISC.4) a 
consists of verbal subtests and performance subtests. In fact, WISC-III contains 13 subtests, three 
which are supplementary. The verbal subtests of WISC-III are: Information test, Comprehens; : 
test, Arithmetic test, Similarities test and Vocabulary test. Digit span test is a supplementary verbs 
test. The performance subtests of WISC-III are: Picture completion test, Picture arrangement test 
Block design test, Object assembly test and Coding test. Two supplementary performance subtes: 
are Maze test and Symbol search test. All these subtests, with the exception of Maze test ard 


Symbol search test, parallel the corresponding WAIS-III subtests in contents and functions being 
measured. 


In WISC-III, scaled scores are determined on the basis of raw scores through the norms 
each age level. Like WAIS-III, here too, the mean scaled score is set at 10 and standard deviation 
at 3. Scaled scores are summed for obtaining verbal, performance and full scale IQs. These totals 


are then compared against a single standard score with a mean of 100 and standard deviation © 
15 tor each of the three IQs. 


Reliability and validity coefficients of WISC-III are highly satisfactory. Split-half reliabilite 
for full score, verbal IQ and performance IQ averaged .96, .95 and .91 respectively. Lon 
reliability coefficients are only slightly below these values. The correlation between the WSC’ 
and WAIS-IIl and further downward extension, that is, WPPSI-R were evidences for vali 
coefficients. All these coefficients for the ful] scale, verbal and performance IQs are in wan 
70s and .80s. The WISC-III has also been correlated with the Stanford—Binet scale, with © 
majority of coefficients in the .60s and .70s for individual subtests and in the .80s to 90s 0°" 
three IQs. 

As a fair evaluation to WISC-III, it can be said that although it shares many of the strengh* 
WAIS-III, it has the following problems: 


) WISC-III fails to incorporate recent developments in cognitive science and we P 
multiple intelligence. As we know, cognitive science tends to emphasiz® ie ke 
terms of the various Components of information processing (Hunt 1980) and - senitio® 
executive processes (Sternberg 1984a), organizing skills (Das 1987) and Ee ae yw 
Sternberg & Gardner 1982). We get a meagre reflection of an attempt to incovt pn f 
of these abilities in WISC-III in terms of only as a supplemental symbo! search vba 
lact, the reality is that in the 1991 revision of this scale, minimal changes 


introduced. 


have 





- le 


Measurement of Intelligence, Aptitude and Achievement 167 


lacks treatment validity (Witt & Gresham 1985). In fact, specific deficits on 
n't necessarily reflect specific treatment because such deficits are rarely 
edial intervention for children. 


) WISC-III 
(ii) mS 6 
related to rem 
«WISC-III suffers from the problem of selection bias towards certain cultural and racial 
(iii roups (Saccuzzo & Johnson 1995). Although this scale makes prediction about school 
P chievement of children regardless of ethnic background, the reality is that white 
children get higher IQ scores than children of other ethnic groups. The obvious result is 
selection bias. African-American children are overselected for the mentally retarded 


classes whereas they are underselected for the gifted and superior classes. 


liv) WISC-III lacks the index approach of WAIS-III. 

The next version of the test is WISC-IV, which was released in 2003 followed by UK version 
in 2004. This fourth edition of WISC was adapted and standardized for India in 2012. The 
WISC-IV has 15 subtests, 10 of which formed part of previous WISC III. The five new subtests are: 
Picture Concepts test, Matrix Reasoning test, Letter-Number Sequencing test, and two 
supplemental tests, namely, Cancellation test and Word Reasoning test. On WISC-IV a total of 
five composite scores could be derived: Full Scale IQ (FSIQ) that represented overall intellectual 
ability, Verbal Comprehension Index (VCI), Perceptual Reasoning Index (PRI), Processing Speed 
index (PSI) and Working Memory Index (WMI). The WMI was formerly called as Freedom from 
Distractibility Index and PRI was formerly known as Perceptual Organization Index. The subtests 
included by the four indices are as under: 


Verbal Comprehension Index Perceptual Reasoning Index 
« Vocabulary (X) « Block Design (X) 

* Similarities (X) « Picture Concepts 0%) 

* Comprehension (X) « Matrix Reasoning (X) 

+ Information (Y) « Picture Completion (Y) 


+ Word Reasoning (Y) 

Working Memory Index Processing Speed Index 
* Digit Span (x) « Coding (X) 

* Letter-Numbering Sequencing (%) - Symbol Search (X) 

> Arithmetic « Cancellation (Y) 


X’ indicates core subtests whereas ‘Y’ indicates supplemental tests. 
Towards the fall of 2014, the latest version of WISC is the WISC-V. 


mies Wechsler developed another scale for children between 4 and 6 years of age. This 
the = as Wechsler Preschool and Primary Scale of Intelligence (WPPSI). Thus WPPSI is a 
“eee aiid extension of WISC. WPPSI was revised in 1989 and called WPPSI-R and 
Subtest« : hae of 3 years to 7 years and 3 months. Like WAIS-III and WISC-III, it has verbal 
lormat a Performance subtests. In fact, WPPSI-R parallels the WAIS-III and WISC-III in the 
subtests a € method of determining reliabilities and subtests, too. However, only two different 
dre quired noon in WPPSI-R: Animal Pegs test and Sentences test. In the Pegs test, children 
Pecitied to place a coloured cylinder into an appropriate hole in front of an animal within a 
Ime, whereas in the sentence test, they are asked to repeat sentences told by the 


Both these tests are optional tests. 


| y. 


Was 











168 ‘Tests, Measurements and Research Methods in Behavioural Sciences 


WPPSI-R has also a sufficient degree of reliability and validity, Reliab; 
comparable to those obtained with other Wechsler scales, Va 
correlation against the measure of Stanford-Binet scale was .92, 

The third edition of WPPSI called WPPSI-III was released in 2092. 
fourth edition called WPPSI-IV was published in 2012 and is con 
measure of cognitive development for preschool and young children COming in b Le 
range of 2 years 6 months to 7 years 7 months. WPPSI-IV includes some oom , hes” 
new processing speed measures, the addition of working memory subtests ma Change, \ 
factor structure that includes separate visual spatial and fluid reasoning com an Pin 
structure of WPPSI-IV includes three level of interpretation: Full Scale level, ine The te 
level and Ancillary Index Scale level. The details are as under: TY Index So, 


5 a Oe | | 
lidity ©Oetficient. ly 
" form 


th 
‘] is 


Its Current ' 
Sidered 3. : otha 


Primary Index Scale 
(i) Verbal Comprehension Index (VCI) 
(ii) Visual Spatial Index (VSI) 
(iii) Working Memory Index (WMI) 
(iv) Fluid Reasoning Index (FRI) 
(v) Processing Speed Index (PSI) 


Ancillary Index Scale 
(i) Vocabulary Acquisition Index (VAI) 

(ii) Nonverbal Index (NVI) 

(iii) General Ability Index (GAD 

(iv) Cognitive Proficiency Index (CPI) 
_ Clinical psychologists, school psychologists and neuropsychologists working in schoo 
hospitals, clinics, universities use WPPSI-IV for evaluating children for cognitive deka 
intellectual disabilities, autism and giftedness, etc. 


Concluding remarks on Wechsler Scales 


All the three Wechsler scales, na mely, WAIS-IV, WISC-IV and WPPSI-IV have definitely impo 
as compared to the decades when they had been originally developed. As a consequence, 
have become very popular. Many computer-assisted interpretative programmes and interpe 
guides have also added glory to their popularity (Kaufman 1994; Nicholson & Alco a 
However, some critics have also expressed their apprehension that even the improved wer 
the Wechsler scales would soon become obsolete in light of the current demands 10" a 
between assessment instruments / tests and intervention strategies (Shaw, Swerdlik and ait 
1993; Sternberg 1993). In view of this. it can be said that the weakest point of all the 


\/ : | , ; yidt * 
Wechsler scales has been their lack of theoretical grounding which makes tt hard to pl’ 
coherent basis of interpretation. 


A comparison between Binet Scales and Wechsler Scales 


There are two profound differences between Binet scales and Wechsler scales. Thee 
can be discussed as under: 


(1) Binet scales are age scales whereas Wechsler scales are point scales. BI" f 
1908 to 1972) are basically age scales because in this scale, items 4° gro ed OF 
level and in each age level two thirds to three fourths of individuals ar F” msi " 
a group of tasks included by each age level. In age scale, arrangement ® aie 
concerned to their content. At any age level, there may be differen *a ask le . 
different aspects of the content. For example at a particular age leve nat ee 
related to perceptual speed, another task to memory and a 2 


dittere™™ 


es 
. 1 sCa aft 


ei! 


e 








Measurement of Intelligence, Aptitude and Achievement 169 


concentration Of language skills. At another age level, one task may be related to 
reasoning. In this way, different types content may be scattered throughout the scale. 
gesides, IN aB© scale the subjects don t receive a specific amount of credit or points for 
each task completed. For example, if Binet scale required that the subject must pass four 
out of five items In order to receive a credit, and suppose he passes only three items, he 
will receive NO credit for this part of the test. 

Wechsler scales are point scales where credits or points are assigned to each item. Here the 

oct receives the specific amount of credits or points for each item passed. The Binet scale has 

ae ven such credit. On the modern Binet scale, a basal score may be arrived, whether the 

es correct on three out of four or on all four items at two consecutive levels. The person 
who gels three items correct obtains the same basal score as the one who gets all four correct. In 
this way it can be said that although modern Binet scale has abandoned the concept of age scale, 
has not yet adopted the point scale concept. 

(2) Binet scale and Wechsler scale differ from each other on the basis of performance scale 
concept. In the Binet scale, there is emphasis upon verbal skills and language for which it 
has been criticized from time to time. Wechsler was aware of this problem and so he 
included performance scale in this test. In the performance scale, the subject is required 
to do something and that becomes the basis for his intelligence. 

Although early Binet scales contained some performance tasks, these were mostly restricted 
to younger age levels. Apart from this, results of the subject's response to a performance task on 
the Binet scale were extremely difficult to separate from the results for verbal tasks. As a 
consequence, it was very difficult to tell to what extent a subject’s response to a performance task 
‘acreased or decreased the total score on the test. In Wechsler scale, the picture Is entirely 
different because it contains two separate scales—one for verbal task, called verbal scale, and 
another for performance task, called performance scale. In fact, Wechsler intelligence scale was 
the first to offer the possibility of directly comparing an individual’s verbal and nonverbal 
intelligence, That is, verbal scales and performance scales were standardized on the same sample 
and the results of both scales were manifested in comparable units. 


in this way, we find that the Binet scales and Wechsler scales differ from each other on two 
major counts. 


The Kaufman Scales 


Developed in the 1980s and 1990s, the Kaufman scales tend to incorporate recent developments 
eg construction. The Kaufman scales are individually administered tests, designed for many 
erent uses, There are three types of intelligence tests included in Kaufman’s scales: 
(1) Kaufman Assessment Battery for Children (K-ABC) 
(2) Kaufman Adolescent and Adult Intelligence Scale (KAIT) 


'3) Kaufman Brief Intelligence Test (K-BIT) 


Th | 
€s€ scales may be discussed as under: 


(1) Kaufman Assessment Battery for Children (K-ABC): The K-ABC is an individual 
intelligence test for children between 2. and 125 years of age (Kaufman & Kaufman 


'983a, 1983b). The test consists of 16 subtests combined into five global scales called 

*equential processing, simultaneous processing, mental processing composite (a 
, Combination of sequential and simultaneous processing), achievement and nonverbal. 
nf : 


seqUe, “Ct, the K-ABC measures intelligence through its mental processing scales, namely, 
tia Processing, simultaneous processing and mental processing. Theoretically, the K-ABC 


ments and Research Methods in Behavioural Sciences 
170 = Jests. Measure? 

everal types of approaches oe 7 For example, as ite bines 
logical model of brain —ae yt yet Russian Neuro = ting | 
nC : . fore ‘ | | | Scie. tk 
966), the theory of split brain functioning © oa : mes and the iieare ‘ emi | 
(1960), of the famous cognitive psychologist, eisser [ )In the works ofp. . oma 
cessing otf ) clearly noted distribution between two types of higher-brain ies Se ti 
as sequential-simultaneous distinction. Sequential Processin ses 


o solve problems by mentally analyzing inputs into a seria| order; eS ty 
| ‘OQ - 


is based upon > 
the neuropsycho L 

i, fl 
pro 
Kaufman (1984 
was referred to 


children’s ability d-order recall. Simult S¢ 
- te aea of number and word-order UILANCOUS procecc; ‘YUeny 
order as we find in case of nu Processing jl 


to chi ’s ability to synthesize and integrate informar. 2. Mth, 

ace in parallel, refers to children’s a nd | information {,. Me 
aren (A S Kaufman, R W Kaufman, and ML Kaufman, 1985). In K-ABC, the ce *OWing 
€ chi 


providing separate measures of simultaneous and sequential processing is to identify nee 
unique strength and problem-solving strategies that can help others develo 


intervention strategies for a child. 
Besides sequential-simultaneous distinction, the K-ABC also provides for an indepen 
achievement score. In fact, offering an independent measure of intelligence thas 
sequential-simultaneous processing) and achievement is one of the Major advantages of, .,° 
in addition, K-ABC also includes a nonverbal scale which provides a measure of nes 
children who are linguistically different and handicapped (Flanagan 1995, Reynolds 
Kamphaus 1997). 
The obtained raw scores on each of 16 subtests are converted to standard scores with 4 mez 
of 10 and standard deviation of 3. The global scale score can be converted into a standard «:, 
with a mean of 100 and standard deviation of 15, percentile and age norms. | 
Reliabilities and validities are also sufficient. Split-half reliabilities range from .7? to 42‘, 
preschool children and from .71 to .85 for school-age children. Likewise, split-half coefficers 
for the global scales range from .86 to .93 for preschool children and .89 to .97 for school-az 
children. Test-retest coefficients are slightly lower than these coefficients. Test manual repors: 
variety of validity data which have received considerable support. For example, factor-analytc 
studies support its sequential-simultaneous and mental-processing-achievement distinction 
(McEsters et al. 1998). 
The K-ABC has been criticised. Jensen (1984) has pointed out that this test as compares’ 
Binet and Wechsler scales has poorer predictive validity for school achievement as well as les 
effective assessment of general intelligence. Other criticisms relate to the K-ABCs impere: 
model with its theoretical foundation and disproportionate contribution of mental proce": 
composites to intelligence. Besides, this test is highly visual which limits its use with V* 
handicapped children and it also does not measure the extremes of intelligence (Alken! 
Sternberg (1984b) has also criticised this test bitterly. He has pointed out that the tes! suffers 
a noncorrespondence between its definition and its measurement of intelligence. He hes a 
questioned the empirical support for the theory underlying K-ABC. He has further pointe? ‘ 
that the test gives overemphasis on rote learning at the expense of the ability to learn 


ine second edition of KABC called KABC-I] was published in 2004 el ar 
processing and cognitive abilities of children as well as adolescents in betwee! age> si ne 
peel differs in its conceptual framework and test structure from original KABC. It aa nel 
while KABC is tied to simultaneous/sequential processing approach, the KABC-l a af 
different theoretical models: the Cattell-Horn—Carroll (CHC) psychometric model of a 
Luria’s neuropsychological theory of processing. saya tt 

KABC-II has 18 subtests of two types: core and supplemental. Before the testing psa jou! 
ao has to decide which model to follow: ae ar ame, (oartate model ame ty ai 
ie scales: simultaneous processing scale, sequential processing scale, learnins esi (" 
Planning ability. CHC model renames simultaneous processing scale as visual prot 


The sf 


— —— 


sures” 
| bs | 


= 
a 


hh. % 





> 


] le as short-term memory (Gsm), | ss | | 
| processin scale : rt emory “Teaming ability as long-term storage 
que al Gln) and planning ability as fluid reasoning (Gf). In CHC model, an Y iMaal ee 


Measurement of Intelligence, Aptitude and Achievement 171 


and a crystallized ability (Gc) or knowledge has also been included. 
ABC includes the following subtests: 

1, Simultaneous / Gv 2. Sequential / Gsm 

: Triangles « Word order 


. Face reognition * Number recall 


. Block counting 
. Conceptual thinking 


« Hand movements 


« Rover 

» Gestalt closure 

, Pattern reasoning (ages 5 and 6) 
» Story completion (ages 5 and 6) 


3. Planning / Gf 4. Learning / Glr 


. Pattern reasoning (ages 7-18) « Atlantis 

» Story completion (ages 7—| 8) » Atlantis delayed 
» Rebus 
» Rebus delayed 


5. Knowledge or Crystallized ability (Gc) (only in CHC mode) 


+ Riddles 
+ Expressive vocabulary 
+ Verbal knowledge 
waren two general intelligence composite scores: Mental Processing Index (MPI) 
alee uria's model and Fluid-Crystallized Index (FCN based on CHC model. In general, Luria 
30-75 hi about 25-60 minutes in its administration whereas the CHC model takes about 
2 person's _ depending upon the age of the child. This test helps the researcher in identifying 
i. strengths and weaknesses in cognitive ability and mental processing. 
2) Kaufman Adolescent and Adult Intelligence Test (KAIT): The KAIT was constructed as a 
measure of intelligence for persons whose ages ranged from 11 to 85 years or older 


AS Kaufman & N L Kaufman 1993). In fact, this test represents an attempt to integrate the 
Cattell (1966) with 


se, of fluid and crystallized intelligence formulated by Horn & | wit 
oi ns about adult intelligence proposed by other theorists (Luria 1980; Piaget 1972). 
. test is composed of a crystallized scale and a fluid scale. The former measures 
Oncepts acquired from schooling and acculturation whereas the latter measures the 


abil 
ity to solve new problems. 


cnt KAIT incorporates two compo 
subtests Prk sia fluid intelligence scale, crystal! 
INcludes ¢ takes about 1 hour and 15 minutes to comp 
filfisier battery tests as well as fou : 
Bical wien in its completion. The core battery subtests | 
€Dus | PS, a test of sequential reasoning, a test of meas 
earning and a test of long-term memory: The core 

Zed intelligence are: a test of word knowledge an 


nents: a core battery and an expanded battery. 


i additional subtests an 


TYstallj 


tallized intelligence scale and Six 
lete. The expanded battery 
d takes about 1 hour and 
related to fluid intelligence are: 
uring induction, mystery codes, 
battery subtests related to 
d language development, 








< 


172. Tests, Measurements and Research Methods in Behavioural Sctences 


definitions, double meanings, a measure of language comprehensgig 
comprehension and a test of listening ability. nN, ay tgs 
The expanded battery also incorporates memory for block designs, , 
visual processing related to fluid intelligence, famous faces, a test of cultural beste ' 
related to crystallized intelligence, Rebus delayed recall as well as audito iy dpe 
recall. ¥ delayed 
In KAIT battery, there Is an optional supplemental mental status €xaminat; 
KAIT yields IQ scores ina relatively wider range, that is, from much lower-than 
intelligence to much higher-than-average intelligence, it is often used as a 


of individuals with exceptional abilities like gifted children. 


n. Since 
“AVEr an. 

Tage 
n ASSESsmen 


(3) The Kaufman Brief Intelligence Test (K-BIT): K-BIT was designed as a quick SCréen; 
for estimating the level of intellectual functioning (Kaufman & Kaufman 1990). jy "Bes 
the age range of 4 to 10 years. It is an individually administered test but is not a hea 
version of either the K-ABC or the KAIT. It consists of verbal scale, nonverbal] scale om 
composite. The verbal scale consists of one verbal subtest of 45 expressive joa 
items and 37 definitions and one nonverbal scale of 48 matrices. The three a 
namely, verbal score, nonverbal score and the composite score are latin: 
deviation 1Q units. It has a higher reliability coefficient. i 


The second edition of KBIT called KBIT-2 was published in 2004. This is a brief 
intelligence test for individuals from ages 4 to 90 years. KBIT-2 consists of yerba| 
subscales, nonverbal subscales and IQ composite. The verbal scale comprises two 
combined subtests, which assess receptive vocabulary and general information as wel 
as reasoning, comprehension and vocabulary knowledge (Riddles). The nonverba! 
scales incorporate Matrices subtest to tap the ability to complete visual analogies and 
understand relationship. KBIT-2 requires 15 to 30 minutes for its completion. Adults 
gifted children, as well as examinees with reflective style of test taking may take a longer 
time. It is also worth noting that none of three subtests of KBIT-2 are timed. 


Nonverbal Intelligence Scales 

In their efforts to construct intelligence tests which can be applied across cultures, 
psychometricians have developed many nonverbal scales which can be administered either 
individually or in a group. Two important such scales to be discussed here are: Raven's 
Progressive Matrices (RPM), and Goodenough-Harris Drawing test. 

The Raven’s Progressive Matrices (RPM) test is one of the best known and most popular 
intelligence tests that can be used either in a group or individually. It covers people from 3 yea 
to elderly adults. It consists of 60 multiple-choice questions listed in order of difficulty (Kapa 
Saccuzzo 2009). It has been designed primarily as a measure of Spearman’s g factor of gene 
intelligence (Raven 1983; J Raven, ] C Raven & Court 1995). The items of the test consists of ase 
of matrices or arrangements of designs into rows and columns, from each of which a part an 
missing. The task of the subject is to choose the missing insert from the given alternative 
completing a pattern. Many patterns are presented in form of 6 x6, 4 x4,3 x3 or4* 2 nom 
gives the test its name. The easier items simply require accuracy of discrimination but the me 
items require some complex processes like analogies, permutations, alterations of pattem 
other logical relations. The test is usually administered with no time limits. ices 

The RPM is available in three different types of forms: The Standard Pr agressive A ce 
ataein hi Pe 2g vp len mpi Matrices (CPM-1990 Ed oe she e at 
mei ee i . = 94 edition). SPM is suited for average individua all ment 

:. s available for younger children or special persons, espe 


—_ - 





Measurement of yt 

vent of Intelligence. 4 Pittude and Ach; 
| am jlévement 173 
ded One who cannot be tested with the SPM for one 


jevelope? for above-average adolescents and adults. or the other reasons, APM has been 


geliabilities and validities of RPM were high and satisfactory.” 
_om .70 tO 90 and internal consistency coefficients age ory. Test- 
-oefficients with both verbal and performance tests of intelligence range he 
sroviding a higher value with performance tests than with verbal tests NEE SEES AO are, 
| Goodenough Draw-a-Man test is another nonverbal — ~ 
imple test in which the examinee is simply instructed to make 2 p 
he can. This test was originally developed in 1926 and was used 
this year, it was revised by Harris and now, the title became Goode 
this revised form, like In the original test, the emphasis in drawing is placed upon the child’ 
accuracy Of observation and on development of conceptual thinking rather thar a 
ckills. Subjects get credit for each item included in their drawings. As a rule ‘aes detail | artistic 
one point, with possibility of a total of 70 points. In this revised form, there are three vesdes: The 
Man scale, The Woman Scale and the Self scale. In the Man Scale, the examinee is asked to draw 
a picture of a man; in the Woman Scale, the examinee is asked to draw a picture of a woman. In 
the Self Scale, which is basically treated as a projective test of personality, the examinee draws a 
picture of himself. The point scores on each of the three scales are translated into standard scores 
with a mean of 100 and standard deviation of 15. The test is administered to ages 3 years to 15 
years 11 months. However, the preferred age is 3 to 10 years. 

This test has sufficient degree of reliabilities and validities. The test-retest reliabilities, as well 
as split-half and scorer reliability, are adequate (Dunn 1967; Harris 1963). The validity of the test 
is provided by correlations with other intelligence tests and although these correlations vary 
widely, the majority are above .50. | 

All these nonverbal scales and similar others, although initially developed for cross-cultural 
testing, have also been widely and popularily used in clinical and counseling settings. As already 
said, these nonverbal tests can be used either in individual situations or in groups. Besides these, 
there are some exclusive group tests which will be briefly discussed here. : 

History bears testimony to the fact that mass testing began during Jp mses : “- 
development of the Army Alpha test and the Army Beta test for use In the Unite a . ee 
Army Alpha bal test designed for the purpose of general screening an sie 

y Alpha test was a verbe MS ) test for use with persons who could not 
purposes, whereas the Army Beta test was 4 nonlanguage tes d. The pattern established by 
be properly tested due to illiteracy OF foreign-language amen es group tests for civilian 
these tests was closely followed in the subsequent development O° N's" Y 
applications. 

Three group tests of intelligence are worth 
Henson-Nelson test and Cognitive abilities test 

The Kuhlmann—Anderson test (KAT) is a 8FO™ 


ee | ct, it cover sriaty of items on 
“vering kindergarten through 12th grade. reed veveral tests with a variety . — 
“indergarte | ; d it each level CO aoe +hildren but also to Tes 
oa level Aenea and it's suited not only 10 ¥ b 


+ adition of the KAT can 
alte of the latest edition ¢ _ 

i might be handi di verbal expression>- The eoontee total scores Can be expressed 
© €xpre cil se | and total scores: a pon the form of percentile band which 
ag in ssed In verbal, quantitative ! " éaii be expressed In 1 ; hy - most likely represent a 
Is |j Ra IQs. Scores at the —, the range of per -entiles 
K€ a confidence interval prov'd!" : 
JEct’s ane . gre in low 

‘ true score. lit-half coefficients. © lidity is concerned, the KAT 
Reliability coefficients, particularly ae gQs. $0 far 4° the va 
~“TClents range from low 80s to the high - 


retest reliabilities ran 
| | 45 ranged 
argely in the .80s and .90s. Validity 


e of intelligence. This is a very 
icture of a man as accurately as 
without change until 1963. In 
nough—Harris Drawing Test. In 


mentioning here: Kuhlmann-Anderson test, 
ch measures intelligence 


‘ntelligence test whi i. eel 
P agi 5 eight separate levels in between 


Sub 


Os and test-retest 


~ 


174 Tests, Measurements and Research Methods in Behavioural Sciences 


correlates highly with different intelligence tests. In sum, it can be said that th 
extremely sound group nonverbal test of intelligence. OE RAT i 

Henmon—Nelson test is another group mental ability test for all Brade Jey . 
produces a single score which is believed to reflect general intelligence. Two sets IS tag 
available. One set is based on raw score distributions by age and the other se whe | 
distributions by grade. There are provisions for raw scores that can be converte 
IQs as well as percentiles. It consists of 90 items which takes about 30 min utes in its... ation 
The reliability coefficients, particularly split-half and test-retest, run in 90s, Ac Pétion 
measure, the test correlates well with a variety of intelligence tests and these valid 
coefficients range from .50 to .84. The major advantage of this test is that it can rea 
predict future academic success. However, by providing only a single score mK: help 
Spearman’s g factor, this test does not consider multiple intelligence. Likewise, when the _ 
being developed, no special effort had been made to check for content bias, either by side, fies 
by statistical analysis. Another problem with this test is its relatively low ceiling. For a, 
achieve an IQ of 130 a ninth-grade child would have to answer about 85 out re 


ae a Of 90 je. 
correctly. This automatically leaves only five items to discriminate all those above | 30 Items 


The Cognitive Abilities Test is another important group inte! ligence test which Provides th, 
separate intelligence scores: quantitative, verbal and nonverbal. This test is designed 7 
measuring intelligence of poorly educated people as well as for people for whom English js 
second language. The test authors have taken special care for eliminating cultural bias and for 
eliminating the effect of test-taking skills, and its test administration includes extensive practice 
exercises. Each of the three subtests of cognitive abilities test requires 32 to 34 minutes of actual 
working time which should be spread over two to three days. The reliabilities, that is K-R 2 tor the 
verbal test are in high .90s, for the quantitative test in low .90s and for the nonverbal in high .90s, 

On the negative side, it can be said that the three tests of cognitive abilities test are primarily 
power tests but no data are provided in its support. Likewise, uncertainty remains as to whether 
norms which apparently appear strong, are representative (Kaplan & Saccuzzo 2001) 


Ms 5 
t on ra es 


Some Indian Intelligence Tests 


Many types of intelligence tests have been developed in India, a glimpse of which can be had by 
examining a selected number of abstracts. One of the pioneers of intelligence testing in India i 
Professor S M Mohsin who constructed an intelligence test in Hindustani in the late 1930s 
(Pandey 2000). In The first Mental measurement Handbook for India, a list of 103 tests 0 
intelligence in different Indian languages including the test of general intelligence by Professor 
Mohsin has been provided (Long & Mehta 1966). Samples, reliability, validity and norms were 
highly satisfactory. In fact, documentation of Indian tests is done by National Library © 
Educational and Psychological tests (NLEPT) at the National Council of Educational Resea" 
and Training (NCERT). NLEPT has published an Indian mental measurement handbook 7 
intelligence tests. Some of the popular Indian intelligence tests are General Mental Ability — 
children by R P Shrivastava and K Saxena, Group test of intelligence by P Ahuja, Group om of 
Intelligence by G C Ahuja, Verbal Intelligence Scale by R K Ojha and Ray Chaudhary, - ' 
General Intelligence for college students by S K Pal and K S Mishra, Group Test of sage 7 
R K Tandon, Emotional Intelligence Scale by A Hude, S Pethe and Upindar Dhar a" Usha 
Intelligence Scale by N K Chadha and Usha Ganesan, Indian child Intelligence test =e 
Khire. These inteligence tests are marketed by the National Psychological Corporation, + sence 

Kumar (1991) has provided a decadewise analysis of development and nature - nee and 
tests in India. He reported that approximately 70 per cent intelligence tests were an f thes® 
standardized during the 1960s and 1970s. However, there occurred a decline in the 198 ” aba : 
tests, 67 per cent were verbal, 23 per cent nonverbal and 10 per cent were performan 





—_— 





Measurement of Inielligence, Aptitude and Achievement 175 


cent of these tests were developed 
2g 


> diligence of 13-17 years of age. 
inte "'o™ 


pES OF INTELLIGENCE TEST SCORES 
od ly, there are three types of intelligence test scores which 
Usually, 


intelligence quotient and the deviation intelli gence quotie 
on intelligence scores is as under: 


in Hindi and they were group tests meant for assessing 


are very common: the mental age, 
nt. A detailed discussion of each of 


Mental Age | | 

The concept of mental age is very popular in the measurement of intelligence and 
arst time introduced by Binet, a French psychologist 
revision of his intelligence test—the Binet-Simon scale. Mental age, in the words of Tuckman 
(1975, 329) refers to “a score that is determined by comparing a child’s score with the scores 
obtained by younger and older children in the norming group.” Thus mental age is ascore which 
isobtained by comparing a child’s obtained score with the average score (called age norms) at his 
age level. For example, suppose a 10-year-old child obtains a score of 60 ona certain intelligence 
test, which is an age scale. Further, suppose that the mental age norms on this test indicate that a 
child in order to have the mental age of 9 and 10 years must obtain a score of 60 and 70 
respectively. In such a situation his mental age will be 9 years (and not 10 years), although his real 
age is 10 years. Likewise, there may be another child who is 8 years old, but obtains a score of 60 
on that test. In such a situation, he will be said to have a mental age of 10 years although his 
chronological age is only 8 years. Mental age is usually expressed in terms of years and months 
such as 6 years 2 months, 6 years 6 months, 8 years 6 months, and so on. The most common way 
to derive mental age is to obtain a representative sample at each of the age levels for which the 
intelligence test is being developed. For each age level, items are prepared and given to them. 
The mean (or average) performance of each age level becomes the mental age for that age-mate. 


Intelligence Quotient 


As mentioned earlier, the concept of intelligence quotient (or |Q) was for the first time introduced 
inthe 1916 American revision of the Binet-Simon scale done at Stanford University by Terman. 
IQ is defined as the ratio of mental age to the chronological age or real age, multiplied by 100 to 
avoid decimal fractions. It is, therefore, also known as the ratio 1Q. The equation for IQ is: 


it was for the 
, in 1908, when he published the first 


- Mental age %100 (9.1) 
Chronological age 
| 7 | 120 
Suppose a 10-year-old child has mental age of 120 months, then his 1Q = 130 x100 = 100 
Which would be considered as normal or average in the sense that his mental age and 
c fOnological 


age are exactly equal. However, if another 10-year-old child has a mental age of 
'56 Months, his lO = 156 
Child ie his perf 
a Would be considered to be of very superior intelligence in the sense that his pe ormance is 
UCN above the performance of his age-mates and is comparable to children three years older 


i 7 « 
| " gimsel, li, however, a 10-year-old child has a mental age of 96 months only, his 





x100 = 130 which obviously exceeds the average (that is, 100). Such a 


129 *100 = 80, which obviously falls below the average, and he will be considered as dull 


n the . ~ | 
ni | : h -mates and is comparable to 
Children we that his performance is below the average of his age p 


iffeteny ..- Y©AFS younger than him. Many psychologists have worked over the — of on 
lassie “ations. As a result. a consensus (setting aside some minor variations) has emerged for 
Ty) . , AL, ! an 
YING persons on the basis of IQs as provided in Table 9.6. 


> 








6 ress, Adesnrementts andl Research Methods in Behavioural Sciences 
17 ests, Meas 


Although the concept of the IQ ratio has been very popular in Measurin 
some important shortcomings. 

1. It is a known fact that after 16 or 19, the mental age of a person tends to tap. 
few rare exceptions. Later, the chronological age continues to INCrease an ize Wit 
remains more or less static. In other words, the numerator of the IQ equation lene Tental . 
while the denominator continues to inflate. The natural outcome of this Will be he Cons be 


tay 
seems to be less and less intelligent as he or she grows older in age. ta pe, 


50 


ses 
s Melligence ith 
' as 


Table 9.6 A General classification of person in terms of IQ 





IQ PES vere Classification 
Above 140 Genius 
130-140 Véry Superior 
120-129 Siiperior 
110-119 Bright 
90-109 Average 
80-89 Dull 
70-79 Borderline 
69 and below Mentally defective 


2. Another shortcoming, as has been pointed out by many psychologists, is that the 
variability in IQ score from one test to another is not the same. Natural ly, then, ona test with les 
variability an individual will have to obtain a higher score for being as far above average or 


normal as on a test with greater variability. In such a case, the \Q score of the different tests would 
not be directly comparable. 


3. It has been commonly seen that the variability in 1Q scores for the different age levels 
the same test is not the same. In other words, the standard deviations of the IQ scores are notte 
same for the different age levels on the same test. In such a case, the IQ score would be: 
misleading index because this would indicate that a person’s IQ score varies as he grows olde 
though his position compared to his age-mates remains the same. Thus a 10-year-old childs! 
score of 120 (where SD is 12) may mean a different thing when it is compared with a 7-yearol 
child’s IQ score of 120 (where SD is 9). Thus, the meaning of the same IQ score of 120atthe™ 
age levels with different standard deviations would not be identical (whereas it should be). 


| . +ticall) 
The IQ ratio was a very popular concept until about 1960. After that it was Py 

replaced by the concept of deviation IQ or DIQ so that the above shortcoming, ak 

third one, could be removed. Nowadays the IQ ratio score is easily converted into 4 “i 


| : ; score 
standard score with a fixed mean of 100 and fixed SD of 15 or 16. Such a converted IQ ratio 


is k 2 us | in cons” 
's Known as the deviation IQ score or DIQ score. This fixed mean and SD aire a child’ 


through all the age levels (for which the test is meant) and for all levels of a single te which 


! 


mean his DIQ score would be 100 + 15 or 16=119 ae som 
SD below-th VIQ lies above the average of his age-mates. Likewise if a Q al 
vad C - the mean, his DIQ score would be 84 or 85, which would indicate tha ve vail ; 
socks “ ae of his age-mates, Thus in DIQ the standard deviation has ator cuss” 0 
the Dic : eve’ and thereby, controls the aberrant variability found in the IQ rally: 

2 score has already been given in Chapter 7. 


would indicate that his DI 





pistINCTION whines ible DE TEST AND ACHIEVEMENT T 
intelligence’, ‘aptitude’ and ‘achievement’ are ¢ . - 
| and educational testing. It is, therefore ommonly used in the field of 


chologica eicecat essentia : 
) at terms aS well as among ‘intelligence test’, ‘aptitude tial s make a distinction among 
thes 1€ test’, and ‘achievement test’. It is dary 


e + not impossible, to distinguis 
yifficult, if not poss 4 guish between the two terms intellic , 
i cussed previously, intelligence refers to a general set of mental abiliti gence and aptitude. As 
mental abilities, 2 child's performance in the future can be - ilities, On the basis of these 
4. is a person's ability, acquired or innate, to | predicted. Aptitude, on the other 
hand, , to learn or develop knowledge of a skill j 
epecitic area. Freeman (1962, 431) has defined aptitude as “a combi age of a skill in some 
-dieative of an individual's capacity to acquire (with traini ination of characteristics 
indica q with training) some specific knowledge. ski 
cet of organised responses such as the ability to speak a language, to be 5 skill or 
mechanical work.” Likewise, Tuckman (1975, 475) has defined a diode ere ae 
abilities and other characteristics, whether native or acquired pink or believed : Coe ot 
ae Se ae | ee les e indicative 
af an individual's ability to learn or to develop proficiency in some particular area.” Obvious! 
both Freeman and Tuckman intend to say more or less the same thing regarding the meaning x 
aptitude, which, according to them, refers to the capacity or ability to acquire skill or knowledge 
ina particular area. On the basis of such abilities, the future performance of a child can be 
predicted. Achievement refers to what a person has acquired or achieved after the specific 
training OF instruction has been imparted. In other words, achievement tests are primarily 
designed to measure the effects of a specitic programme of instruction or training (Anastasi 1968). 
Thus, the performance on the achievement test indicates the performance under known and 
controlled conditions (because the performance is the outcome of specific training given ina 
specific field), whereas aptitude tests indicate performance that is the product of multiplicity of 
experiences of everyday life and therefore, indicate the effect of learning under unknown and 
relatively uncontrolled conditions. 

When the meanings of intelligence, aptitude and achievement are known, it is easier to 
distinguish among intelligence test, aptitude test and achievement test. Freeman (1962, 431) has 
defined aptitude test as one “designed to measure a person’s potential ability in an activity of a 
specialised kind and within a restricted range.” Achievement test, also known as proficiency test, 
is one which measures the extent to which a person has acquired or achieved certain information 
or proficiency as a function of instruction or training (Tuckman 1975). In attempting to distinguish 
among these tests, intelligence tests and aptitude tests may be placed on one side rae 
achievement tests on the other. Aptitude tests (as well as intelligence tests) are ae a 

ro: [8 '. ] i . e 
whereas achievement tests are present and past oriented (Stanley & — ih im 
fii purpose of an aptitude test (and intelligence test) is to — . 4 - i“ =a wiela 

| . : | E51 | 

eo te plitnary purpase OFF Sc ait iid 4 are psychometric tests 
Person has learned. In other words, intelligence test and aptitude fe p aon 
Wheteas achi : A psychometric test is one Whose major Int 
= chievement tests are edumetric tests. A PSYo" dan edumetric 
Sto mea ye a . | ability, a titude or trait, etc., an a 
ny asure individual differences like a general ability, oP se growth of individu als such a3 
meas, ame Walia major Inertion * Se wn 513). However, 4 single test 
Can ‘bail of skill, proficiency and achievemen' peel de! ending upon the purpose of the 
est coed either in a psychometric way OF edumetric WY +h : urpose of predicting the future 

“Or example, wh ical aptitude test '5 used for the p 
pie, when a numerical apts 


: tric test but when the 
etical ability ot a person it would be con 2 seen of some training, it 
ie lest is used to measure the numerical ability of a nec in general, use the test in a 

| ween aes sycholog!s’, 
PYcho Sere 6 be an Boa ie i test in an adumetric sense- 


‘* sense and educators, In general, use th hievement test is not very on in 
i " | : t an 
™ en ee lene — — ei (1972) have demonstrated tha 


Practice. In a recent study, Bracht an 


The terms 


profici 


sidered to be a PSY 





8 Tests, Measurements and Research Methods in Behavioural Sciences 
| res. 5, hae F al 


educational achievement test (obviously, a kind of achievement test) Predicts 

better than the intelligence test. Now, psychometricians are using a new term, « re aChiey, 
ability’ in place of aptitude tests and achievement tests. The assumption underlying OF dye, 
em is that all general intelligence tests, aptitude tests (multiple aptitude bake the Use 
aptitude tests) and achievement tests are ability tests and simply measure the dete the 
a person in one or more field. 


lopeg ain 
TYPES OF APTITUDE AND ACHIEVEMENT TESTS 


Aptitude Tests 


Aptitude tests may conveniently be grouped into two categories—multiple aptitud 
special aptitude tests. Multiple aptitude tests are those which intend to measure Several tests g 
each by an independent subtest. Hence, mu Itiple aptitude tests are not tests but raither APtitUdes 
tests. Special aptitude tests are those which intend to measure only one aptitude, The atterias cy 
earlier than multiple aptitude tests. Among the earliest special aptitude tests was the meee 
aptitude test. Special aptitude tests were developed at the time when the primary em Panic 
testing was placed on the general intelligence test and it was thought that for died al 
individual measurement of intelligence, it must be supplemented by the measurement of "8 the 
aptitudes like mechanical, numerical, etc. But later researches, particularly the factorague 
researches, conducted by Thurstone and Guilford revealed that intelligence itself Consists * 
several independent special aptitudes and since then, the emphasis has shifted from the special 
aptitude test to the multiple aptitude test. Not only this, several aptitude tests are now easih 
included in multiple aptitude batteries. In the following sections, some of the aptitude tests 
representative of the above two broad categories are discussed. 


Differential Aptitude Tests 


The Differential Aptitude Test or DAT is one of the most common multiple aptitude tests, Fir! 
published in 1947, the battery has undergone several revisions and restandardizations and 
presently, it is available in its fifth edition done in 1992. The battery has been developed by 
Bennett, Seashore and Wesman and comprises eight subtests—verbal reasoning, numeral 
reasoning, abstract reasoning, mechanical reasoning, clerical and speed and accuracy, spa 
relations, spelling, and language usage. The battery is mainly meant for educational af 
vocational counselling of students from grade 8 through 12. The whole battery has. 
equivalent forms, S and T, and it roughly takes about three hours to administer. Scores oe 
subtest are converted into the percentile ranks for their proper interpretation. In addition na 
subscores, a ninth subscore is provided by adding the scores on the verbal reasoning vie 
numerical reasoning (NR) test and, subsequently, the composite score is transformed ie 
percentile rank. The VR + NR score becomes the index of general scholastic aptitudl indian 
interpreted as one index of mental ability. The DAT has also been adapted °) 
Psychologists to suit local requirements. 





General Aptitude Test Battery (GATB) oymen 


| 

ee F i | . 
The General Aptitude Test Battery (GATB) was developed by the United weal : i nc 
Service in 1962 for use primarily in the armed services. The battery was constructe “— 

of extensive factor an 


alysis in which 59 tes -orre! _On the basis of the y ine 

10 factors were chosen which, in a seen ontoes _ oa — were reduce’ ¥ wm 

Se ip ed tests. The nine factors of the GATB are: intelligence (G), “a Someptiol ( fo! 

aputude (V), spatial aptitude (S), form perception (P), cler ical pe ts 

eins rece (K), finger dexterity (F), and manual dexterity (M). Of the 12 te | 

he is t re nine factors, eight are verbal tests and four tests develope? . 
nverbal tests requiring simple apparatus for subjects. The tests in 








Measurement of Intelligence, Aptitude and Achievement 179 


_. G through Q, are available in alternate forms. The scores obtained on the nine tests of the 
factor ah converted into a standard score with a mean of 100 and standard deviation of 20 for 
GATB A retation. The whole battery takes about two hours and thirty seconds in its complete 
their In —— The GATB has been widely used in the employment services, too. The primary 
eae of the battery is that all its tests are highly speeded and some important aptitudes have 
in 


+ been included. For example, test of mechanical reasoning has not been included in 
aimee 


the battery. 
Flanagan Aptitude C lassification Tests 


The Flanagan Aptitude C lassification Test or FACT is another battery of the multiple aptitude tests. 
The battery has been primarily developed for vocational counselling and employee selection and 
has been named after its author, Flanagan, who did the job analysis of several occupations, 
which led to the emergence of 21 ‘job elements’ or abilties that discriminated well between 
successful and unsuccessful workers on each job. For measuring 19 job elements out of 21, 
verbal tests have been developed and for the remaining two, performance tests have been 
developed. The whole battery requires three testing sessions and more than ten and a half hours 
are required in its complete administration. 


Armed Services Vocational Aptitude Battery (ASVAB) 


The Armed Services Vocational Aptitude Battery (ASVAB) is another popular multiple aptitude 
battery. It is designed for students in grades 11 and 12 as well as for postsecondary schools. It 
provides scores used in both educational and military settings. When used in military settings, 
ASVAB results can help identify such students who are fit for various types of military 
occupational programmes. 


ASVAB comprises ten subtests: general science, paragraph comprehension, arithmetic reasoning, 
work knowledge, numeral operations, coding speed, auto and shop information, mathematics 
knowledge, mechanical comprehension and electronics information. These ten subtests are 
bfouped into various composites. There are three types of academic composites: academic 
ability, verbal ability and mathematical ability. Likewise, there are four occupational 
“Omposites—mechanical and crafts, business and clerical, electronics and electrical, and health 
and social. Besides these, there is also an overall composite reflecting general ability. The overall 


Psychometic characteristics, that is, its reliability, validity, norms, etc., are excellent (Ree & 
Caretta 1994, 1995). 


Besides the above multiple aptitude batteries, there are some special aptitude tests. Although 
es not much emphasis is being given to the development of special aptitude tests, for 
Sthiniea they are still being preferred. First, special aptitude tests provide more flexibility in 
a6 aa of the appropriate and relevant test than the multiple aptitude batteries. Second, there 
aptitude a of aptitudes, which are seldom included and well covered by the multiple 
hii, atteries. Artistic talents, musical talents, creative talents, vision, hearing and other areas 
Y the 8 Some motor functions are examples of the common areas rarely included and covered 
“plitude batteries. Some of the important special aptitude tests are presented below. 

*eNsory Tests 

Mo 
Mellen any Psychological tests, sensory tests devised by Galton and others to measure 
Pred). ia sa first in vogue. But these tests could not gain prominence due to their failure to 
“Ontinue Nectual accomplishment. Researches on the sensory Capacitles, however, were 
“Uditory a that time and today, there are sophisticated techniqes for measuring visual and 
Norm, ‘ nsitivity, Vision is not a unitary capacity. It has several dimensions and a person who is 
Md bie dimension, may be impaired in another. For example, a person may be colour 
on may be perfectly normal otherwise. AMong the visual dimensions which have 


IS Vis} 


Now 








. , sists for extensive studies are the visual near acuity ACtivis 
attracted the attention of psycholog!s f 15 to 18 inches) and far acuity (usually measureq at th 
(usually measured at a reading distance = depth perception, the colour discrimination and the 
distance of 20 feet), the visual distance ). The most common method of measuring far acuity . 
phobia (that is, muscular cont ene he Snellen chart, which consists of rows of letter 5 
the yart. One such chart is known as the nme setters of 
the wall chart. eoue llen chart is usually placed at a distance of 20 feet and th 
gradually decreasing size. The Sne a three most important and well-known instruments f ; 
person is een . aa sae ee ane | dimensions are the Ortho-Rater, the American Optics 

ri | e above er Se | 

Sight Screener se the Keystone Telebinocular. These three ae ey ee Common 
measures in the selection of personnel in industry and are als ae TEMA, Purposes 
in schools. - ee . 

Audition is another important sensory field where psyc ho OES inci concentrated thei, 
attention. Like vision, audition is also considered as being multidimensional. Of the several 
dimensions, the most widely investigated dimension of hearing or audition iS auditory acuity 
which is usually measured through pure-tone audiometers. such audiometers permit the 
measurement of auditory acuity at different points of sound EQUENCy, either In the group or in 
the individual testing situation. In testing the auditory acuity through audiometers, one ear js 
tested at a time and the threshold for the ascending and the descending direction of sound js 
determined at several frequency levels. The investigator starts with a frequency too low to be 
heard in the ascending direction and it is gradually increased until he perceives the sound 
Clearly; in the descending series the examiner starts with a sound. which is clearly heard and 
decreased gradually until he reports no sound. Usually, the sound is given to the 
through earphones or headphones. Recently, a test for measuring auditory acuity 
has been developed. The test is known as the Massachusetts Hearing 
examinee to check Yes or No to tell whether or not he has he 
More than 40 children can be tested through this test at a time. 


examinee 
among children 
lest, which requires the 
ard the sound signal in each trial, 


Motor Dexterity Tests 


Motor dexterity tests are tests which measure th 
movement in performing a task. There are 
Crawford Small Parts Dexterity test, 
Benett Hand Tool Dexterity test and th 
motor dexterity tests, 


€ co-ordination of hand, arm and/or leg 
several such tests that have been frequently used. The 
the Stromberg Dexterity test, the Purdue Regboard, the 
e Complex Co-ordination test are some of the examples ol 


Artistic Aptitude Tests 


Artistic aptitude tests are concerned with the measurement of artistic ability. Two types of artistic 
aptitudes are commonly investigated today. One type of tect jc ov-l. .«: Y YF 4 with the 
measurement of artistic appreciation ali yp est is exclusively concerned wI o 
measurement of the productive abil another type of test is solely concerned with | 
the assumption that ; stare han These two separate types of tests have been developed 0" 
several paintings but still he hens <i a better sense of appreciation and discrimination amon 
of artistic appreciation =m ri pheon NOt possess the ability to paint. One of the earliest te 
NOW oiit of Use and is i 7 re Art Test developed by McAdory in 1929. But his rest 15 
Test is one example of a test be OF Its historical importance only. The Meier Art judgemen 
1929. According to Meier, aflstic apes 1 oPPreciation. The test was developed by wp 
aesthetic intelligence, volitional iio Of six interrelated traits like manual ue 
aesthetic judgement. The Meier art judg oe “Ognitive imagination, perceptual ability an 

artistic aptitude. In this test, each tet 9: Sr test intends to measure only the last trait of | ? 
Varistion BS of each pair is done by ont’ milar paintings or drawings: O"- 14 

ation of the original Painting in terme ae , nd another one represents 4 9! 
| y ing, balance, use of colour, ete: 


of two very si 
ted painter a 
mmetry, shad 


ACtivity 
€d at the 

and the 
ACUIty js 
letters of 

and the 
nents for 
) Optical 
COMMon 
PUrposes 


ited their 
€ several 
ry acuity, 
=rmit the 
OUD or in 
NE e€ar is 
sound is 
low to be 
he sound 
reard and 
examinee 
@ children 
quires the 
each trial. 


and/or leg 
-used. The 
board, the 
xamples ot 


26 ot artistic 
ed with the 
.d with the 
veloped on 
tion among 
parliest tests 
ut his test 
t judgement 
by Meier!" 
nanual skill 
- ability an 
st trait of the 
ings. One 
ents a slight 
lour, etc: The 





Measurement of Intelligence, Aptitude and Achievemeni 181 


7 ied to choose the more appealing painti 

‘nee is requ ried § Painting out of the two. Obviously. the t 

en the extent to which the aesthetic sense of the y, the test 
rovides 


Jews examinee is in agreement with the 
nions of art specialists. This is Known as the judgement of aesthetic organization, which is 
“ail <idered as one of the most important factors in artistic ability. The Graves Design Judgement 


Test developed by Graves in 1948 and further revised in 1951 is another test of artistic 
appreciation. Here each item consists of two or three designs in which one is original and two are 
altered versions of the original. Again, the examinee is required to choose the most appealing 
design. Unfortunately, not much work has been done with this test and hence, it could not gain 
much popularity. The Horn Art Aptitude Inventory is an example of a test of artistic production. 
The test is a Measure of the ability to produce graphic art. Hence it is an example of the work 
sample test. The test is solely concerned with the measurement of the ability to produce original 
drawing or painting. The test consists of two parts. In one part the examinee is required to make 
90 drawings of very common objects (such as tree, book, etc.), and of geometric figures within a 
specified period of time usually in the range of 3 to 10 seconds. This is known as the 
scribble-and-doodle test. The second part consists of 12 rectangles, each containing a few 
stimulus lines. The examinee is required to make a drawing or painting in each rectangle on the 
basis of his imagination in a way which includes those stimulus lines. This constitutes the test of 
imagery. The scoring ot the test is very subjective and hence, not much reliance can be placed 


upon this test. Other tests of artistic production are the Knauber Art Ability Test and the Lewerenz 
Tests in fundamental abilities of visual art. 


Musical Aptitude Tests 


For measuring musical aptitude, researches were carried out for nearly 40 years at the University 
of lowa in the USA under the general guidelines of Seashore. As a consequence, Seashore, Lewis 
and Saetveit (1939) developed a test of musical aptitude, which was known as the Seashore 
Measures of Musical Talents. The test consists of six subtests like the pitch test, the loudness test, 
the rhythm test, the time test, the timbre test and the tonal memory test. The test is meant for 
students from grade 4 to adults. Each item of the Seashore test consists of two tones and the 
examinee is required to make a judgement regarding the given characteristics of the tones. For 
example, in the pitch test the examinee indicates whether or not the second tone is higher than 
the first tone and gradually each pair in the pitch test is made difficult by narrowing the difference 
between the first and the second tone. In the rhythm test the examinee is required to compare the 
rhythmic patterns of each pair, which are made either similar or different in each trial. The 
loudness test requires the examinee to compare the two tones in each trial and determine 
whether or not the second tone is stronger than the first tone. Within the time limit, the examinee 
'S to report whether or not the second tone of each pair is longer in duration than the first tone. In 
the timbre test the examinee is required to discriminate between the tonal quality of the two tones 
n each pair, the quality of the two tones being made sometimes either equal or different. In the 
onal memory test, two short series of notes (each consisting of three to five tones) are played and 
inthe second series one of the notes is changed. The examinee is required to detect the changed 
note. The test does not yield a composite score for the entire test, rather total scores for each 
‘uDlest are Computed and each total score is converted into the percentile norms. Psychologists 
“'€ of view ‘that the test does not yield very meaningful results for a child who is below 10 and 
<= ave argued that the test is only a measure of certain types of sensory discrimination, which 
“essential but not sufficient for musical aptitudes (Nunnally 1970). | 
Which Wing standardized test of musical intelligence is a ne ai gut 
ils tern Pee by Wing (1941, 1962), 2 Brifish psycho ge ee ae he Seashore test bY 
Most of the ne pr sao ca acipmi bon ie en i . ie chord analysis, the pitch 
lisctimnimat teachers of music. The test has seven subtests such as the ch ii me sing, The 
lon, the tuning for pitch, the harmony, the rhythm, the intensity an p 


_ AR 


+s 


J Research Methods in Behavioural Sciences 
182. Tests. Measurements and Research M 


first three tests require sensory discrimination ata level _ sae than the Seashore te 

in the last four tests, the examinee is required to Sompare' e = etic quality of the tWo vee and 
The first three tests may also be used as a separate ane aiid for younger Children oo" 
screening purposes. The entire test takes about one NauT ut the may also be adminis 

two sessions. Norms are available for the entire battery as well as for the first 


: OF 
| red | 
separately. The reliability and the validity of the test are fairly high and Satisfactory, Ubtes, 


Test of Mechanical and Clerical Aptitudes 


Although tests of mechanical and clerical aptitudes have been included in the m Pit 
batteries, they have also been constructed as a full independent test. There are two proba 
reasons. First, it is not possible for any test constructor to give as much emphasis ac these i 
require when placed in a battery of aptitude test. Second, these two tests are more COmmon| 
used for a variety of purposes and as such, a separate status for them is essential. Y 


A test of mechanical aptitudes is solely concerned with mechanical 
mechanical reasoning or mechanical comprehension. In some tests of mechanical a 
factors, namely, the perceptual aptitudes and the spatial aptitudes, play an important role 
whereas in some other tests, only items regarding the appraisal of mechanical reasoning and 
mechanical information are included. Examples of the former type of mechanical aptitude teg 
are the Minnesota Paper Form Board test and the Space Relations test of the Differential Aptitude 
Test (DAT), and examples of the latter type of mechanical aptitude tests are the Mechanical 


Reasoning test of the DAT, the SRA Mechanical Aptitude test, and the Benett Mechanical 
Comprehension test. 


ultiple = 


Ptitudes two 


Test of clerical aptitudes mainly emphasize perceptual speed and accuracy. The two 
well-known examples are the Clerical Speed and Accuracy Test of the DAT and the Minnesota 
Clerical Test. The Clerical Speed and Accu racy Test consists of two parts—each part having 100 
items—which require the examinee to compare the letters and number combinations within a 
limited time. Each item in each part has five combinations, one of which is underlined. The 
examinee is to find the same one in the answersheet and draw a line under it. Part | is the practice 
form and Part Il is the actual form, which is scored. Each correct answer is given a score of one 
and thus, the maximum score that an examinee can earn, is 100. The Minnesota Clerical Test also 
consists of two subtests, namely, the Number Comparison and the Name Comparison, each 
timed separately. In the Number Comparison Test, the examinee is presented with pairs of digits 
ranging from 3 to 12 and is required to put a mark on the blank space left between the two sets 


digits. In the Name Comparison Test, proper names are given in place of the digits and the 
examinee Is required to do the same task. 


Achievement Tests 


Achievement tests (also known a 
the field of Psychological a : : ils ot 
. nd educational resear iscussing the detalls 

different types of SU ase ches. Before discussing 


-as. Tht 
= “J. . : s es. 
aa = ga ent tests, it is essential to discuss some of their important US 
ain uses of achievement tests are Eilon balaur 


1. Achie | 
duslonee ee tests are an effective Way to check any weakness in the instruct! 
the teacher far ie con ia? . Ifa Weakness is found in the instruction, the on ott 

, €d to improve his instruction so that it may improve the $ al shot 


Instructior ikewis, . 
scm - attorme i showing slackness may be motivated by ai 
- Median be benefited from the subsequent instructions. jul vide? 
very easy means ai on also effective In the formulation of educational goals an a wel 
constructed achlevitnantes ~amination of the content and method of ine ante 
In any educational ach an effective way of producing and illustrating the M40" hod 
Boal, which is often understood in terms of some commonly ag! 


7 ed tool '" 
S proficiency tests) are another important frequently used the 


\ 
ons or eve 
ctor 


the two Versic 

t children Or fo, 
2 administereg oO 
rst three subt 

ry. e 


St ang 


in 
Sts 


SIS as these teste 
nore commonly 


al information 
al aptitudes two 
important role 
| reasoning and 
cal aptitude test 
rential Aptitude 
he Mechanical 
ett Mechanical 


iracy. The two 
the Minnesota 
yart having 100 
ations within a 
nderlined. The 
| is the practice 
a score of one 
erical Test also 
yparison, each 
| pairs of digits 
the two sets Of 
digits and the 


ly used too! in 
details of the 
Fant USES: | 


-tions or eve” 
> jnstructor of 
e <ubseque™ 
special sno 


and provide ® 
tion. A ee 
yajor cane 
bed metho” 


, 


Measurement of Intelligence, Aptitude and Achievement 183 


« and instrucions by which its attainment is to be asses 
dicate the adequacy of the content of the he semana ome. neonate 
ropped and how much can be substituted by new stadt Whit ave te sree 
nigunderstandings encountered in the course of content? How can the learners be tekeka onl 
made ‘nterested IN bringing concrete changes in the course content? By providing answers to 
these questions and the like, achievement tests provide aids in the critical evaluation of the 
-ontent and methods of teaching. 

3. Achievement tests also help in adapting the instruction to the individual need of the 
learners. The performance on the achievement tests directly reveals the need for further guidance 
to be given to each learner and accordingly, the instruction can be modified to suit the individual 
need. If a learner has poor performance on the achievement test in spelling English words, It 
obviously indicates that the learner is in need of special guidance and course training in the 
spelling of English words. As such, the general instruction may be imparted in a way to suit this 
individual need in a Satisfactory way. 

Like aptitude tests, achievement tests can also be classified into two types—the general 
achievement batteries and the special achievement test. The achievement batteries attempt to 
measure the general educational achievement in several areas of the academic curriculum. They 
can be used from primary levels in schools to adult levels in colleges but most achievement 
batteries have concentrated upon the primary and the secondary levels in school. Some of the 
important achievement batteries developed so far are the lowa_ Tests of Educational 
California Achievement Tests, lowa Tests of Basic Skills and SRA Achievement 
Services—all of which emphasize educational skills such as arithmetical skill, spelling and 
reading skill and work-study skills (that is, map and graph readings). Likewise, the Sequential Test 
of Educational Progress (STEP) is another common example of the achievement battery, which 
consists of six subtests—reading comprehension, mathematics, science, social studies, listening 
comprehension and writing. Each subtest has multiple-choice items and is available in parallel 
forms. Adult Basic Learning Examination (ABLE) is also a battery of achievement tests specially 
meant for undereducated adults, mostly coming from economically and socially disadvantaged 
families. The ABLE comprises four subtests—vocabulary, reading, spelling and arithmetic. The 
test is meant for grades 1 through 4 (Level |) and grades 5 through 8 (Level Il) and is available in 


parallel forms at both the levels. 

Special achievement tests meant for measuring the achievements of pupils in some selected 
areas may conveniently be grouped into two distinct groups—the diagnostic tests and the 
standardized end-of-course examinations (Anastasi 1968). The primary purpose of diagnostic 
tests is to identify the educationally retarded pupils and to suggest remedial programmes for 
them. Such tests are available in special areas like reading skill and mathematical skill. The 
Stanford Diagnostic Reading Test is an example of a diagnostic test in reading skill. The test is a 
group test and is meant for elementary school pupils. It is available in two equivalent forms at 
level |, which is meant for grades 2.5 to 4.9 and at level II, which is meant for grades 4.5 to 8.5. 
Different dimensions of reading skill such as reading comprehension, vocabulary and 
word-recognition skills are measured by this test. The Durrell Analysis ot Reading Difficulty 's 


another diagnositc test in reading skill. The Durrell tests provide scores for the comprehension - 
| istening comprehension and word analysis. The 


oral and silent readi i | ition, | 
lent reading, rapid word recognition, roe ee <s 
Stanford Di cc Ao le of a diagnostic test In mathematical skill. This 
Jiagnost : tis an example of 4 lag : nereaibecone an 
: ic AMIENS * administration. It 1s also a group 


test is very simi | acct ding test in its 
ry similar to the Stanford Dia nostic Reading mai 4 
. ‘¢ at level | and level Il. Level lis for grades 2.5 to 4.5 


lest and is avai 

| vailable in two equivalent form -_ ven 'i 
and level II is for grades re to 8.5. The standardized end-of-course CaM otc Sian 
omnes series of achievement tests for differen! subjects taught apie = caleets: 16) 
evel. Since these are the co-ordinated series of achievement tests IN iffere jects, 


test may In 


Development, the 











84 Tests, Measurements and Research Methods in Bebavioural Sciences 


S,a 
Usual 
Ndard 


vide one system of comparable norms for all tests (of the different subjects) and thu 
provi gee oF scores obtained in different subjects by the same testee is possible. 
ay nel obtained on the different tests are transformed into a single scale of sta 
ra | 


with a fixed mean and standard deviation. 


Cite, 
ly, the 
3COte 


Essay-Type Tests Compared to Achievement ene ; P an 

Essay-type tests are those tests which require pom y ie nn a : Written reSPonse jy 
some questions that tend to reveal some iniorma es i = ; i oe, dynamics any 
functioning of the student’s knowledge as (knowledge) has — modified by a Particular ser o¢ 
learning experiences. Such tests have been very Sennen y used for measuring a student. 
achievement in the class. When they are used for measuring the student S achievement in a 
classroom by a teacher, they are called teacher-made classroom achievement lests, Essay tests 2 
measures of achievement are popular because of some potential advantages. Essay items o, 
questions require students to communicate effectively. Here they plan their answers in such a 
way that there is a meaningful organisation. Moreover, essay items are not difficult to prepare if 
the teacher is well-versed and well-informed in the field. 

But use of essay tests as measures of achievement has progressively decreased because of 
some inherent weaknesses like high cost of reading essay answers, difficulty of obtaining reliable 
and objective scoring, etc. Now essay tests are gradually being replaced by standardized 
achievement tests where objective items are used. The biggest merits of standardized 
achievement tests are that in a relatively shorter period of time, comparatively more items can be 
answered and the scoring can be done in a perfectly objective way. 

One basic point of distinction between the essay test and achievement test is that the essay 
test can be used for measuring intelligence, aptitude and even personality adjustment (especially 
with the help of short-answer essay questions or items) apart from classroom achievement 
whereas achievement tests, whether in the form of essay items or objective items, can be used 
only for measuring proficiency acquired by the students in a field after the training. Such tests 
become useless in the context of intelligence, aptitude or adjustment. 


Limitations of Achievement Tests 


Although achievement tests are very Important tests for making decisions about a pupil's 
performance, they have some limitations as mentioned below. : 

1. Scores on achievement tests cannot be taken as a basis for deciding students’ promotion 
to the next grade. This is because most of the achievement tests are not foolproof measures ° 
achievement in the concerned field. Many factors may determine the classroom achievemert® 
a pupil but those factors may not be adequately assessed by the achievement test. 


: a | | er ifficull 
2. Achievement tests, as Compared to intelligence tests and aptitude tests, ale dific 
to construct. | | 


3. Sometimes achiev 
But this is not a correct 
achievement are numero 


| ones 
ement test results are taken as a measure of the teacher's ell i's 
interpretation of the results because the factors that affect 2 P 
us of which, the teacher's quality is only one. 


TESTS OF CREATIVITY 


Tests of Creativity measure diver ; 


_ | to th 
gent thinking (Guilford 1967). Divergent thinking ee plems 


ere the individual expands to many correct solutions ° J from 


; : om. % rent 
of information. In this way, divergent thinking ate “nation 
onize or reduce several pieces te test 0” 


‘ Fi s | 
thinking Is Mostly measured by the general mental _ nee? 
view, tests of creativity require = 


s— 





' fe a . 1 | - it 


abe correct and appropriate response to the gi — | 
om atihe two impertant tests S creativity, Riven situation. The following are the brief 
1. The Torrance test of creativity was developed by Torray ) | 
of two sections—verbal and figural. The valid cera caver er se = —— 
improvement and (3) unusual uses. In the ask-and-guess subsection, the on; : product 
picture and asked to describe what had led to the scenes in the picture and what in i sa a 
next; in the product improvement subsection, a picture of a toy is given to the mi iol 
encouraged to suggest changes in the toy so as to make it more fun for playing, and in the = ; , 
yses subsection the examinee is asked to give as many unusual uses as he or she can ofa ae 
abject. The fi gural section requires the examinee to draw a picture representing an object and to 
ell an interesting and exciting story with a picture. Each subset of the figural section gives 
something to the examinee to start with. For example, he may be given a circle with the 
instruction to construct as many figures as possible, with the circle as the key part in each figure. 
The Torrance test can be used with both small children and adolescents. The test is administered 
individually and orally for children below the fourth grade. On the basis of this test, three scores 
are given to each examinee—fluency (which indicates total number of acceptable responses), 
flexibility (which indicates the number of categories in the manual used by the examinee) and 
originality (which indicates the number of responses not found in the list of frequent responses). 
No norms are available and therefore, each examinee’s total score is evaluated against a set of 
common criteria for creative achievement. 
> The Remote Associates Test (RAT) is another test for creativity, which has been developed 


by Mednick and Mednick in 1971. The test is meant for high-school students and consists of 40 
ted with three words and asked to 


items. The items of the test are such that the examinee is presen 

give the fourth word which may be related to each of the three words. Those three words are 
thought to indicate remote associative clusters and the fourth word a mediating link. One-minute 
time is allowed in each item and thus, the entire test takes about 40 minutes. One serious 
criticism of the RAT is that its validity is not established. Due to the lack of validity, Worthen and 
Clark (1971) contended that the RAT was a better measure of sensitivity to the structure of 


language rather than a measure of creative achievement. 


¥] Review Questions 
of traits or factors? Justify your answer. 
ence. Does the global capacity 
of intelligence? 
intelligence tests in psychological 


l. Is intelligence a combination of groups . 
view of 


2. Discuss the two-level process of intellig 
intelligence replace the two-level-process View 


3. Discuss the research utilities of the important type of 
of the major 


oa t some 
" —— -¢¢ Point Out SO! 
4. Distinguish between aptitude tests and achievement tests. Pe 
limitations of achievement tests. — 
nd achievement tests: 


- Make a comparative study of essaytyP© tests a 


8. Write short notes on the following: 
(a) Intelligence quotient 

(b) Intelligence as structure of intell 
Make a comparative study of Wechsler scales 


Distinguish between SB4 and SB5. 


ect 
and Binet scales. 


a> 


—————_— ll See eee == = 
eS eS ee 


10 
MEASUREMENT OF PERSONALITy 
CHAPTER PREVIEW a 


e Meaning and Purpose of Personality Measurement 
Social Traits 
Motives 
Personal Conceptions 
Adjustment 
e Tools of Personality Assessment 
Self-report Inventories 
Observational Methods 
Projective Techniques 


e Popular Strategies Involved in Construction of Personality Inventories 
Personality Self-report Inventories 
The Logical-Content Strategy 
The Criterion-Group Strategy 
The Factor Analytic Strategy 
The Theoretical Strategy 
Combination Strategies 
Self-report Inventories in India 
¢ Overcoming Distortions in Self-report Inventories 
Establishment of Rapport 
Use of Forced-Choice Technique 
Concealing the Main Purpose of the Test 
Use of Verification and Correction Keys 
e Situational Tests 
e Measurement of Interests, Values and Attitudes 
Meaning and Types of Interest Test 
Value Tests 
Measures of Attitudes 


MEANING AND PURPOSE OF PERSONALITY MEASUREMENT th i 


Th ‘ ity’ , interacts 
€ term ‘Personality’ refers to the total functions of an individual who interac” fie 
environment. Such a definition 


them ns 
personality. The purpose of the 
traits. Trait is nothing but the obs 
observed but inferred from the 
traits are what the Person does 
opined that personality measur 


automatically includes all traits as the Mal te 
measurement of personality is to describe 4 er 3 ot dite ; 
erved consistency of behaviour of a person. Traits 4 wal cues 
consistent behaviour of an individual. The a 8, 35h 
how he does it and how well he does it. Nunnally A of trail 


rement aims at studying the following four broad ty 


186 





Zz ; 
Measurement of Personality 187 


cial Traits + | | | 
Soc’ | traits are those traits which determine how persons interact with other persons in the 
secit examples of typical social traits are friendliness, honesty, dominance, responsibility, 
a atc. THUS, within social traits are included traits related to temperament and character. 
shyn a eT s | 


Motives — | | 

ntives here include the nonbiological drives such as need to earn money and prestige, need for 
tare achievement, need for affiliation, aggression, etc. These nonbiological needs are often 
as to constitute what is known as personality dynamics. 
sal 


Personal Conceptions | 
Under the trait of personal conceptions are included those methods which determine people’s 
attitude toward the self and others, a person’s values, interests, etc. 

Adjustment 

Adjustment includes traits like the freedom from emotional worries or instability and other 
disruptive behaviour. Maladjustment is the opposite of adjustment and includes pathological 
traits like hallucinations, hysteria, imaginary illness, psychoses, etc. 


One general characteristic of these four types of traits is that they are correlated with each 
other, that is, they are not independent. For example, a social trait, say dominance, is likely to 
influence motives, personal conceptions (that is, interests, attitudes, etc.) and adjustment. 
Likewise, an individual’s motive is likely to influence his interaction with others in society (that is, 
social traits), his personal conception and his adjustment, too. If an individual is highly 
prejudiced against a particular caste (an example of personal conception), his social interaction, 
motives and adjustment would all be similarly affected. Likewise, if a person has a satisfactory 
general adjustment, his social traits, motives and personal conception would be in congruence 
with social norms. On the other hand, if an individual is maladjusted, he would have extreme 
social traits, eccentric motives and personal conceptions. 


Personality measurement aims at measuring the above four broad categories of general 
traits. For the measurement of social traits, motives and adjustment, more or less a similar set of 
principles is found but to measure personal conception, which includes attitude, interest and 
values, a different set of principles is necessary. 


TOOLS OF PERSONALITY ASSESSMENT 

There are three most common tools (or methods) of personality assessment. 
1. Self-report personality inventories, or Structured personality tests 
2. Observational methods including situational test 
3. Projective techniques | 


Abrief discussion of each of these is given below. 


Self-report Inventories 
Self-report 


wh Inventories, also known as personality inventories, are the self-rating questionnaires, 
ere the 


lise individual describes his own feelings, environment, and reactions of others towards 

the ee a nutshell, on the self-report inventories a person reports about himself in the light of 

alge ©stlons (or items) put therein. Hence, the method is known as a self-report inventory. 
“Port inventories are further classified into the following five types: 

sein, a ventories that attempt to measure social and certain other specified traits such as 
Ontidence, dominance, ego-strength, extroversion, responsibility, etc. The Bernreuter 





-” a FE 





188 Tests, Meastirements and Research Methods in Behavioural Sciences 


| , some of the examples of this category. “TSOnalin 
inventory are some ot the examples Ny 

2. Inventories that attempt to evaluate the adjustment of persons to differe 
environment, such as school, home, health, etc. The Bell Adjustment 


best example. 


> Of t. 
Inventory OF the 


15 the 

3. Inventories that attempt to evaluate pathological traits such ac h 
hypomania, depression, schizophrenia, etc. The Minnesota Multiphasic Per 
(MMPI) is the best example. 


ysteria, Parangj, 
sonality Invent ' 
# i 


4, Inventories that attempt to screen individuals into two or three Broups. The Cornel Ine 
is the best example of such an inventory. It screens the persons into two STOUPs—those fa 
psychosomatic difficulties like asthma, peptic ulcers, migraine, convulsive disorders, ¢ 
those not having them, that is, those who are normal. 


having 
'C., and 


>. Inventories that attempt to measure attitudes, interests and values of persons. The Kude, 
inventories (vocational, occupational and personal), the Strong Vocational Interest Blank the 
Allport-Vernon Study of Values are some of the best examples of this Category ¢j 
self-report inventories, 


It does not follow, however, that the above five classifications of inventories have nothing ir 
common. In reality, the classification is based upon the purpose and the nature of item content, 
All the above self-report inventories are based upon the same principle, which states that 
behaviours are nothing but the manifestation of trait and one can find out the presence or 
absence of a trait by means of assessing the behaviour. 


A review of some of the representative personality inventories has been done later ir 
this chapter. 


Observational Methods 


Observational methods are distinct from self-report inventories. Observational methods provide 
either a structured or unstructured situation. A structured situation is a controlled situation 
whereas an unstructured situation is an uncontrolled situation. Persons whose personality 25 
are to be observed are put in either of these two situations and careful, impartial and accurse 
observations are made by the observers. The observation becomes the basis for assessing t 
personality traits. In some observational methods, however, a departure is made from the aoe 
set procedure. The differences in observations made by different observers reflect the subject" 


in the observations of the observers. A detailed discussion of the observation method appe* ' 
Chapter 12. 


Projective Techniques 


Projective techniques are the most popular method of assessing the personality of the sie’ 
As its name implies, the person whose traits are to be studied is asked to describe an unsiue” ie 
stimulus or situation and through his responses, his needs, drives, motives, fears. & 
revealed. (For a complete discussi 


on Of projective techniques, refer to Chapter 11.) 


i , | J | ait’ 
TAY EREAR. STRATEGIES INVOLVED IN CONSTRUCTION OF PERSON 
INVENTORIES 
Personality Self-report Inventories are the 
_ aes ee a ur fests) OY 
self-report personality inventories (also technically called Structured Personality © "aval 
— popular measure of Personality, In fact, structured personality tests: 1 ts iacludi"s 
personality traits, Personality types, personality states and some other related aspeC™ actin? 
self-concept. Brie | = | 


: fly, personality ¢ 
CONSIStent manner. Pe 


ee 


oy ‘ ; zm anak. Tee an “ 4 att 
| . rails reter to a relative disposition to think, ersonall? 
sonality types refer to a general description of people. F&™ 





~ ae 


Measurement of Personality 189 


otional reactions that vary from one 


san situation to another. Self-concept 
efer t0 anized and consistent set of assumpti : 


celal ons that a person has about himself 
4 structured personality test, the subject or testee is 
n 


9 them in some way as ‘True’ or ‘False’ to indicate 
nat reason why it is called structured or objective method of personality as 
That is ished from the projective method. Here each statement is structured and lacks 
pity whereas in projective method of Personality the stimul 

amin 


us is ambiguous and the 
fee has no definite guidelines about what type of response is needed. 
test 


A review of literature reveals that the Construction of structured Personality tests is based 
upon certain popular strategies. There are tive such strategies: the logical-content strategy, the 
criterion-group strategy, the factor-analytic strategy, the theoretical strategy and the mixed 
strategy. We shall discuss each of these strategies separately, 


refers to a 
or herself. 


The Logical-Content Strategy 


As its name implies, in this strategy the test constructor uses various types of reason and deductive 
logic in the development of the test. This strategy assumes that the test item describes the person’s 


personality and behaviour. For example, if the testee or subject marks ‘True’ for the statement 
like to participate in social activities” then testers assume that he or she really likes social 
participation. Thus the test constructor assumed the face validity of a test response. Initial efforts 
to assess personality used this logical-content approach as the primary approach. 

specific tests of personality were not developed until World War | wh 
realized the need to distinguish people on the b 


the first personality inventory was deve 


en psychologists 
asis of emotional well-being. To meet this need, 


loped by Woodworth (1920) and it was named 
Woodworth Personal Data Sheet, whose purpose was to identify military recruits likely to 


develop emotional problems in combat. It consisted of 116 questions to which the testee 
responded either ‘Yes’ or ‘No’. The items of the test had been selected from known symptoms of 


emotional disorders and from the questions asked by clinical psychologists and psychiatrists in 
their screening interviews. The test produced 


a single overall score providing a global measure of 
the personality functioning. 


The success of the Woodworth Personal Data Sheet 


in solving the problem of mass screening 
attracted several other 


Psychologists who devoted attention to. the development of the 
logical-content lest Of personality. Two of the best-known early tests based on logical-content 
strategy were the Bel| Adjustment lnventory and the Bernreuter Personality Inventory. The Bell 
Adjustment Inventory assessed adjustment ot persons in various areas such as home life, social 
life, health lite and emotional lite. The Bernreuter Personality Inventory measured six personality 
traits like sociability, confidence and introversion. Both these tests were originally published in 


the 1930) and yielded not one but more than one overall score. In fact, this laid the foundation of 


m L = : ' . 
any such Modern tests ot personality that yielded multiple scores. 
Althou 


Personalit gh the logical-content approach was very popular during the tn phase of 
main| ity test Construction, its Weaknesses soon became evident. These wea wrias were 

MY telated to its assumptions side. The logical-content approach assumed that the subject 
“= tae with the given Instructions and provides an honest Resp, the Stabs pect goes through 
IN test ; and is also Capable ol evaluating his or her own behaviour; the subject, test nee 
Wuestion a all detine the items of the test in a similar manner. Awide waety of researc Ss 
Necessarily = validity of all these assumptions. In fact, none of these assumptions was 

Y true, 


Titicige, Structured Personality tests based on the logic of face validity were so sharply 
wae at such tests were more or less fully discarded (Ellis 1946; Landis 1936; Mc Nemar & 


f > a CONsequence, a new strategy called the criterion-group strategy emerged as a 
8) 7 ; “e 
New ©onceptualization in personality test construction. 





ay 
a 





Pi ty & 





1 ha, * lec 


ei Strategy | . | 
The Criterion-Group i her method of personality test construc: 
The criterion-group strategy | arte UCtion ,,. 


an improvement over the nna rs ae i his isin fact an, __ lt 
data collection and various types OF © Seas AINA fes deter cd 
ee ature of personality (Kaplan & Saccuzzo 2001). In th: Mining 
the meaning of a test response or awe — constructor selects two groups of rite Tae 
of personality test Te siteelit group consists of persons or individuals : Peto 
group = a Site casrauc ts Wysterle or schizophrenia. The control SOUP Consists % Share 
tain his general population. He then — and ae a group ) 
make a distinction between the criterion group ane = aaa arted ia vn distin is 
these two groups are selected in the test. Thus the actua i og ce Validity of an iter, in he 
criterion-group strategy is of little value, Subsequen y - ems are CTOSS-Validate ; : 
checking how well it distinguishes an independent criterion Ses ——— Of individual. also 
known to possess the characteristics to be measured, from a open group. If the scale js found t, 
distinguish these two groups then it Is said to have been cross-validated. Subsequently, data fron 
normal control groups are used to obtain standard scores. Thus one can easily determine now 
above and below the mean of the normal group each new subject scores in terms 
standardized units. After the test has been constructed and cross-validated, some adg 
research is conducted to ascertain empirically what it means when the subjects endorse 
items of a particular dimension of the test. 


enthusiasm to make 
approach that relies 


ersOn. 
: 1 a 
Of ite 


far 
Of the 
itional 
Many 


Here we shall discuss two well-known tests which have been developed sin, 
criterion-group strategy: Minnesota Multiphasic Personality Inventory and Californi: 
Psychological! Inventory. 


Minnesota Multiphasic Personality Inventory (MMPI) 


The Minnesota Multiphasic Personality Inventory (MMPI) is a true—false self-report, or 
questionnaire measure of personality. This inventory was developed in the early 1940s by 
Hathaway & McKinley (1943). It is a very important means for detecting disabling psychological 
abnormalities. It is called multiphasic because it was designed to detect several psychiatric and 
pathological problems. Originally, this inventory had 550 items. Each item of this origind 


inventory had three options: ‘True’, ‘False’, ‘Cannot say’. There were ten clinical scales and our 
validity scales. The names of the clinical scales are: 


1. HS: Hypochondriasis 6. Pa: Paranoia 
2. D: Depression 7. Pt: Psychosthenia 
3. . : 3% 

Hy: Hysteria 8. Sc: Schizophienia 
4. Pd: Psychopathic deviate 9. Ma: Mania 
5 


. Mf: Masculinity-femininity 10 
The Clinical scales were designed to 
depression, etc. The validity scales intended t 

the subject was demonstrating a normal “tia 


Lie scale (L), Infr u 
f e 1 ‘ 
scales are hatiiedka: ni Menai nigh errection Scale (K) and Cannot Say (or ?). Althoug? © 
r 5, i} F 7 é . + * Ci St 5e 
©Y are not concerned with validity in the technical se 


reality, these validj 

. idity scales 

| represent ch ingerins 
angie : chec fice afingerin 
other different kinds of response se cks on carelessness, misunderstanding, MalinBe ° 


designed to detect such Subject i and test-taking aptitudes. In particular, the L ai the 
F scale CONSIsts of such tems pe Hee present themselves in an overly favourable a 
Population. High scores on the F pot scored infrequently (less than 10%) by ue | _ The 


aes 8) 
© Invalidates the profile (Shores & Carstairs 199 od tte 


IViduals who attempted to fake bad. K scale > 


. Si: Social Introversion 

; ; ss ister 
identify psychological disorders like Oe i 
measure test-taking aptitude and to assess whell® 


err | 
st approach to the test. The four validity oie 


Measurement of Personality 191 


cane purpose as = I. scale but it had been empirically constructed. A high score on the K 
“aie may indicale °° cote 388 if an attempt to ‘fake good’, and a low K score tends to 
represent excessive fran ee 3 eat ora deliberate attempt to ‘fake bad’. They cannot 
gay that the score Se lea - mney of items omitted or double marked. Omission up to 10 
items appeals to have ittle € ect on the overall test results but if this count exceeds 30 items the 
rest record IS considered highly suspect and probably becomes invalid (Anastasi & Urbina 1997) 
thus, unlike the other three validity scales, it has no items of its own. 

The subjects obtained a raw score on each of the 10 clinical scales based on the number of 
tems they have marked in the scored direction. Subsequently, raw scores are converted into 
McCall’s T scores with a mean of 50 and a standard deviation of 10. 


Recently, the original MMPI has been revised and reconstructed into two separate versions: 
MMPI-2 (Butcher et al. 1989) and the MMPI-Adolescent (MMPI-A) (Butcher et al. 1992). 


The MMPI-2 consists of 567 affirmative statements which are responded by the subject as 
either ‘True’ or ‘False’. The first 370 items, which are more or less identical to MMPI except for 
editorial changes, provide the responses needed to score the original 10 clinical and 4 validity 
scales. Thus, all the 10 clinical and four validity scales have been retained in MMPI-2. The 
remaining 197 items (of which 107 are new) constitute the full complement of 104 new, revised 
content, and supplementary scales and subscales that make up the complete inventory. The basic 
profile form for the MMPI-2 includes 10 clinical and four validity scales taken over from the 
original version. Apart from this, there are separate profile forms for 15 content scales, 27 content 
component scales, 21 supplementary scales and 28 Harris-Lingoes subscales. All these scales of 
MMPI-2 are scored by using the MMPI-2 normative sample of 2600 adults having an age range of 
16 to 84 years. Some important examples of content scales are HEA (health concerns), TPA which 
evaluates the hard-working, irritable, impatient Type-A personality, FAM (family problems) which 
evaluates family disorders and possible child abuse and WRK (work interference) that examines 
the behaviours and attitudes likely to interfere with work performance. Examples of 
supplementary scales are Anxiety, Repression, Ego-Strength, and Mac Andrew Alcoholism 
Scale-Revised. 


the 


«< the inclusion of three new validity scales that can 


One important feature of the MMPI-2 
responds to the inventory. They are 


help to assess the care and veracity with which the subject 
Back F (F,) scale, the Variable Response Inconsistency Scale (VRIN) and the True Response 
Inconsistency Scale (TRIN). The f,, scale is basically an extension of the original F scale for items 
hat appear in the second half of the inventory and provides a check on validity and correction 
throughout the test. The VRIN scale attempts to evaluate random responding and the TRIN scale 
atlempts to assess acquiescence tendency, that Is, tendency to agree or mark ‘True’ regardless of 
the content, The VRIN consists of matched pairs of items that have similar content. When the 
Subject marks the pairs in the opposite direction, he earns a score on the scale. The TRIN Scale 


“80 consists of matching pairs of items with opposite contents. For example, = receive pie 
RIN Scale, the subject should mark ‘True’ to both items like ‘! feel disturbed most of tne 


™ Temain relaxed most of the time’. 
ly mie ‘ pis be made clear that MMPI-2 os saad 
mda case coed wl oi = 
a subjects and normal subjects were contrasted. Social ee eee. 
The, y ~Caliisailng she: responees cron — - : el (Gaston et al. 1994). 
Medi . Psychometric properties of the MMP! and MMPI-2 are —. sedan eile 
"” test-retest coefficients range from the low .50s to .905. 


cored for 10 clinical scales in addition to its 
that Is, ‘eocial introversion’, the other 
-keyed manner in which the responses 


| ) 








192 Tests, Measurements and Research Methods in Bebavioural Sciences 

-2 run in the .70s. Despite satisfactory rej; abili 
: = Ibu 

antly, such problems are that its items overla 

Ise keying, high intercorrelation amonp ¢. 


reliability coefficients for both MMPI and MMP! 
coefficients, MMPI-2 has some problems. Import 
among the scales, an imbalance shown in true—ta 
(for example, scales 7 and 8 correlate between .64 and .87 
demographic variables (Kaplan & Saccuzzo 2001). 

Despite these limitations, the objective scoring of MMPI and MMPI-2 made it popular both 
as a research and clinical tool. According to Graham (1993), it has been used in more than 
10,000 studies. One survey has revealed that it is preferred by about 90% of the Clinical 
psychologists (Aiken 1988). 

MMPI-A is the new form of MMPI developed specially for use with adolescents. It consists of 
478 items covering areas specifically relevant to adolescents such as school and family problems 
and above all, the provision of age-appropriate norms. MMPI-A basically retains most of the 
features of the MMPI and MMPI-2 including 10 clinical scales and tour validity scales. The age 
range of the standardization sample was 14 to 18 years. 

Besides, MMPI-A has its own validity scales (FR and F,) as well as some content scales, 
supplementary scales and subscales that are unique to it and some that are similar to MMPI-2, 

In Conclusion, it can be said that both MMPI-2 and MMPI-A are today very promising 
and important tools of assessing personality, and publications of books and articles on these two 
seem to be continuing unabated (Pope, Butcher & Seelen 1993; Butcher, Graham & Ben-Povath 
1995). 


S 
) and lack of generalizability i 


California Psychological Inventory 

The California Psychological Inventory (CPI) was originally developed by Harrison Gough in 
1957 and it, then, consisted of 480 items. The inventory has undergone revision, and asa 
consequence, its original length of 480 items has been shortened to 462 items in the 1987 
revision and most recently to 434 items which is the third edition (Gough & Bradley 1996). All 
the 434 items are to be answered as either ‘True’ or ‘False’ and yield scores on 20 scales. 

CPI is, in fact, a structured personality test constructed primarily by the criterion-group 
strategy. One of the major features of CPI is that more than a third of the 434 items are drawn from 
MMPI and it is specifically used tor assessing personalities of normal adult populations. On this 
point, CPI differs from MMPI which assesses only the personalities of abnormal persons. IN the 
latest version of CPI, of 20 scales, three are validity scales designed to assess test-taking attitudes. 
The three validity scales are: Well-being (Wb), Good impression (Gi) and Communality (Cm 
The Wb seale is based on responses by normals asked to ‘take bad’: the Gi scale is based upon 
responses by normals asked to ‘take good’ and the Cm scale is based on a frequency count 0 
highly popular responses. In this way Wb, Gi and Cm are constructed, respectively, to detec! 
subjects who ‘fake good’, ‘take bad’ and respond randomly. The remaining 17 scales provide 
scores on personality dimensions like Dominance (Do), sociability (Sy), Capacity for Status (Cs), 
Social Presence (Sp), Selt-acceptance (Sa), Independence (In), Empathy (Em), Responsibility (Re), 
Socialization (Sc), Seli-control (Sc), Tolerance (To), Achievernent via conformance (AC), 
Achievement via Independence (Ai), Intellectual Efficiency (le), Psychological-Mindedness (Py), 
Flexibility (Fx) and Feminity (Fe). 

Like MMPI-2, all CPI scores are converted into standard score scales with a mean of 50 and a 
standard deviation of 10. The advantage of CPI is that it can be used for assessment of norm 
personalities. The MMP! and MMPI-2 generally don’t apply to assessment of normal subjects: 
Therefore, the future of the CPI as a measure of normal personalities has good potentiality (Bolton 
1992: Groth-Marnat 1999). Cross-cultured studies have shown that the CPI is useful in explaining 
personality differences between ethnic groups (Dana 1993; Davis, Hoffman & Nelson 1990). 





Measurement of Personality 193 


Researches have shown that CPI is very useful for predicting the following: 
(i) high school and college achievement 
(ii) grade point average in medical institutions 

(iii) effectiveness of student and teachers 

(iv) effectiveness of police and military personnels 

(vy) leadership and executive successes 

it has also been found that this test is useful in identifying adolescents or adults who follow a 
delinquent or criminal path in their lives. 

Reliability coefficients of CPI are very similar to those reported for the MMPI. Short-term 
test-retest reliability coefficients range from .49 to .90 and long-term coefficients range from .38 
to .77. However, CPI shares some of the limitations of MMPI. Like the MMPI, the various scales of 
CPI have shown considerable intercorrelation. Similarly, like the MMPI, true—false scale keying is 


often extremely unbalanced. 


The Factor Analytic Strategy 
The factor analytic strategy uses the technique of f 


dimensions of personality. In fact, factor analysi 
redundancy in a set of intercorrelated scores. The principal-components method is one such 


method of factor analysis (Hotelling 1933) that tends to find the minimum number of common 
factors that can account for an ‘nterrelated set of scores. Factor analysis starts with a larger 
database consisting of the intercorrelation of a larger number of items or tests. Subsequently, 
these intercorrelations are factor analyzed to find the minimum number of factors that acount for 
as much of variability in the data as possible. Then, they try to label these factors by finding out 
what the items related to a particular factor have in general, 

Guilford, Cattell and Eysenck were the three very important psychologists who tried to 
develop personality test using the factor analytic strategy. Guilford made the pioneer effort by 
determining intercorrelations of a wide variety of tests and then factor analyzed the results in an 
attempt to find the main dimensions underlying all such tests. After locating such a factor, items 
that correlated high with these factors could be finally taken in the test to capture the major 
dimension of personality. The result of this initial attempt using factor analytic strategy was 4 
series of inventories, which were published by Guilford and his associates in the 1940s. For 
example, Guilford (1940) found five — factors after factor analyzing the test of 
introversion—extroversion. These factors were: social introversion, thinking introversion, 
depression, cycloid tendencies and restraint, popularly known as the /nventory of factors STDCR. 
Further factorial studies by Guilford and Martin led to the development of two other personality 
inventories, Guilford—Martin Inventory factors GAMIN (1943a) and Guilford—Martin Personality 
Inventory (1943b). After combining two highly correlated factors to avoid confusion and 
redefining the remaining other factors, 4 single ten-factors inventory was developed. This 
inventory is known as Guilford-Zimmerman Temperament Survey (Guilford & Zimmerman 
1956). The ten factors or dimensions were: General activity (G), Restraint (R), Ascendence - 
by (9) Ematoal sy 1, bent (Oe, rere 0H 
All “ on acme: ee me iad the subject must indicate ‘Yes’ or 
‘No’ f tems are self-statements as in MMI ic ge rifi ti keys are included to detect 
No’ for each statement like MMPI. In the survey, three verification Key 


'alsification and for evaluating the validity of the profile. 


be However, this first factor analytic structured personality 
Cause it was overshadowed by MMP! and partly because © 


actor analysis to derive empirically the major 
s is a statistical technique for reducing the 


test failed to be popular partly 
f its arbitrary subjective way of 





194 | sts, uA re | ? MAVIOU MI WCIENCeS 
q hi ] Resed re -} M Cl hb ods if Beb aUEout td / vi 
est 7 i feasu pen fs fa? . ca 


- osition is that Guilford—Zimmerman 
naming the various factors. The current p Tempetan, 


istorical | sts. ” 
Survey mainly serves only historical intere 


Cattell’s Personality Questionnaire gis Bice | 

Cattell is another important psychologist a a fen ote analytic na “trategy in 

developing a structured personality test. On the et “ashame OF thee. i , Cattel = 

opment — ss i i run if its fifth a a a is 

the Sixteen Personality Factor Question : ee VR B Catal 
aes attell 1993; Russell & Karol 1994). The 16 PF was originally d fetus. 

al i. — onthe assessment of personality for ages 16 and over. It is a very Widely = 

forced-choice test which is available in five Separate forms. Each form Consists of declarative 

stems that force the testee to respond to a specific SHAG by choosing from a 

or three forced-choice options (Forms A, B, C and D). It yields 


Publishe 


16 scores separately on 16 traits ¢ 
personality, each of which is bipolar. Each of these 16 factors or source traits has bee | 


n Piven 4 
letter-symbol or letter-number combination. All these 16 f 


actors and their letter Symbols have 
been presented in Table 10.1 


Table 10.1 Cattell’s 16 personality factor and their letter symbol 






Factors Letter Symbol | 
Low : ay High 
Reserved Warmth Outgoing A 
Less Intelligent Intelligence More Intelligent B 
Stable Ego Strength Ego-Strength Emotionality/Neuroticism C 
Humble Dominance Assertive E 
Sober impulsivity Happy-go-lucky F 
Expedient Conformity Conscientious G 
Shy Boldness Venturesome H 
Tough minded Sensitivity Tender minded | 
Trusting Suspiciousness Suspicious : 
Practical Imagination Imaginative M 
Forthright Shrewdness Shrewd iN 
Placid Insecurity Apprehensive O 
Conservative Radicalism Experimenting Q, 
Group-tied Self-Sufficiency Self-Sufficient Q» 
Casual Self-discipline Controlled Qs 
_ Relaxed Tension ‘Tensed . Qs - 
sar an ce pe ae om nd an 5 
psychometric Properties (Anastagj & Lltkiina Ye, the questionnaire on the basis strates) 


; . sis 
i Consistent with the factor analy 


Measurement of Personality 195 


correlated highly with each of the 16 major factors or source traits were included 
S, > included and 


items that i +i 
relatively low correlations were discarded from the test 


those with 3 | 
One of the unique features of the 16 PF is the inclusion of ‘Problem Solving questi hich 
, rons’ whic 


contains 15 items presented contiguously at the end of the inventory. These it Zk 

Reasoning scale which is basically intended to act as a quick measure of cma a tah the 

these, the questionnaire now also includes measures for detecting three indi lly Beriees 
acsures for assessing rand : i ‘paceman brie of response style, 

namely, measur ; g§ random responding, acquiescence tendency and 

show socially desirable or undersirable qualities. nee 

Abrief description of each of the 16 factor scales is presented below. 

1. Reserved-outgoing, Scale A: Scale A, in fact, is theoretically a measure of warmth 
Persons securing low scores (Sten score 1-3) on this scale generally seem to al " 
negative traits like aloof, formal, showing, hostility and suspiciousness. These aauate 
associated with paranoid schizophrenia, although the scale does not reese such : 
pathological dimension. Such persons find contact with other persons to be anxiety 

rovoking or even painful. On the other hand, persons scoring high on Scale A (Sten 
score 8-10) like to be around other people. Such persons are more emotionally 
expressive and they may feel disturbed if they are forced into a situation where they are 
not allowed contact with others. Such persons are extrovert and easygoing. 


9. Less intelligent-More intelligent, Scale B: Scale B measures intelligence. Cattell 
distinguishes between two kinds of intelligence—crystallized intelligence and fluid 
intelligence. Fluid intelligence refers to innate abilities, whereas crystallized intelligence 
refers to the abilities that are products of the cultural learning, training and schooling. 

ad to show low mental capacity, poor judgement, less 

ith abstract problems and low morales. Poor scores 
on this scale indicate anxiety, inability to concentrate and poor intelligence. High scores 
on scale B indicate high intelligence, ability to work with abstract ideas, good morale, 
good judgement and high perseverance. It may invariably be mentioned here that 

Cattell has also separately developed Cattell Fair Intelligence Test which is designed to 

assess fluid or innate intelligence, independent of cultural learning and training. 

_Emotionality/Neuroticism, Scale C: Scale C measures ego strength 

of the person. The essence ol this tactor IS inablity [O contro! one’s impulses or to deal 

realistically with various problems (Cattell 1965). High scorers on this scale tend to show 
emotional maturity, general lack ot anxiety and ability to deal with frustrating or very 
difficult situations. Low scorers on this scale demonstrate 4 general emotional liability, 
inability to handle frustration, evasion ol responsibility and tendency to show excessive 
worry over trivial things. From this description, it Is clear that scale C plays an important 
role in identifying many psychiatric disorders. 

4. Humble—Assertive, Scale E: Scale E is a measure of dominance. Low scorers _ . 
unsure, retiring, modest, meek, quiet and obedient. Therefore, they are associate wit 
ya _ ) issi .nd obedience. High scorers are 
conventionality, docility, dependence, submissiveness 4 , ee sqoxietical 
self-assertive, boastful, conceited, aggressive, forceful, uns ee ae 
Therefore they are associated with aggressivenes>, ee mika but can be a serious 


and independence. A high E score is not necessarily patho ms sap 
sign in connection with other scale elevations that obviously SUBE 


1 (1965) has shown that 
personality or a tendency towards emotional outburs's. oa girls. Likewise, 
men and boys tend to score higher on dominance ote fayckman 1993). 
neurotics who are gradually improving show increase !N 


Low scorers on Scale B te 
perseverance, poor ability to work w 


3. Stable Ego-Strength— 





196 Tests, Measurements and Research Methods in Bebavioural Sciences 


af 


11. 


. Expedient-Conscientious, Scale G: Scale G is a measure of group conformity and 


Sober—Happy-go-lucky, Scale F: Scale F measures wwe Mappy-go-lucky Person, 
or high-surgency people (high F scorers) are characterise by being joyous, cheeriy 
sociable, responsive, energetic, witty, humorous, as well 5 talkative. Sober People o, 
people with desurgency (low F scorer) are characterized as depressed, Pessimistic 
seclusive, introspective, worrying and retiring. Cattel has described low F Persons . 
dull, rigid, anxious and depressed. Such individuals are irequently ASSOClated With 
depressive states. On the other hand, high F scorers reflect impulsivity, extroversion ang 
adaptability (Karson & O'Dell 1976). 


; items 
of this scale are very similar to MMPI L scale. High scorers are associated with high 


superego strength, responsibility, conscientiousness, moral Correctness and dutifulness, 
Such persons are either excessively rigid and moralistic or are answering in a ‘faking 
good’ condition. Low scorers on scale G tend to be self-indulgent, undependable, 
frivolous and fickle and are generally unconcerned about group norms and standards. 


. Shy—Venturesome, Scale H: Scale H measures boldness. High H scores are associated 


with extroversion, social boldness, responsiveness, interest in Opposite sex and 
insensitivity to danger signals. On the other hand, low H scores are associated with 
restraint, shyness, emotional cautiousness and unfriendlines. Cattell has suggested that 
low H scores are associated with a tendency towards schizoid personality. 


. Tough-minded—Tender-minded, Scale |: Scale | is a measure of emotional sensitivity, 


High scores on Scale | are associated with dependence, imaginativeness, 
attention-seeking behaviour, emotional sensitivity, insecurity and hypochondriasis. On 
the other hand, low scores are associated with self-reliance, lack of sentimentality, 
cynicism, logic, practicality and lack of hypochondriasis. Some studies demonstrate a 
close relationship existing between Scale | and MF scale of MMPI, 


Trusting—Suspicious, Scale L: Scale L measures suspiciousness. High L scores are 
associated with being jealous, dogmatic, frustrated, domineering, and irritable. In 
extreme scale, such high scores may show si gn of paranoia, Low scores on this scale are 
characterized as being permissive, trusting, tolerant, nonhostile, trusting and uncritical. 


. Practical-lmaginative, Scale M: Scale M measures imagination. Persons having low 


scores on this scale are described as being practical, objective, conservalive, 
conventional and not over-imaginative. On the other hand, persons having high scores 
are characterized by being fanciful, subjective, unconventional, imaginative and 
interested in various types of art and philosophy. High scorers tend to do better in jobs 
requiring imagination and artistic skills. However, individuals with low score on the M 
scale tend to do better on jobs with practical and realistic demands. 


Forthright-Shrewd, Scale N: Scale N is a measure of shrewdness. This scale is found © 
correlate significantly with Hysteria Scale of MMPI. Persons having high scores 0” this 
scale are characterized by worldliness, ambitious, social awareness, calculating MI" 
and having a strong tendency to take advantage of situations. Persons having low score? 
on this scale are characterized by spontaneity, vagueness, lack of self-insight, pa 
genuineness and a blind trust in human nature. According to Karson and O'Dell 97D), 
the scale N is the least useful scale of the 16 PE 


i e 
. Placid-Apprehensive, Scale O: This scale measures guilt proneness. High O scorer - 


self-confident, resilient and placid. Low O scores are suggestive of an uncontrolled 2 
adequate individual who is cheerful and is likely to act when necessary. Such low ae 
may be associated with weak superego controls. On the other hand, high O score 





Measurement of Personality 197 


persons who are insecure, guilt-prone, apprehensive and troubled individuals who - 

more likely to be anxious, depressed and very much sensitive to approval cm who are 
of others. From such a description, it is obvious that this scale plays an om sapproval 
clinical analysis of the 16PF. portant role in 


13. Conservative-Experimenting, Scale Q,: Scale Q, is a measure of rebelliousness. High 
scorers are associated with radicalism and liberalism. Such persons have a riridene ms 
be experimenting, analytical and free thinking. On the other hand, low Q iii 
associated with conservatism. Such individuals tend to show respect for traditional 
things and are unwilling to change the way things are done. Persons having extreme Q 
scores are unable to accept authority, a trait often found in sociopathic personalities. 


14. Group-tied—Self-sufficiency, Scale Q,: This scale measures self-sufficiency. High scorers 
are characterized by being self-sufficient persons who like to depend much upon their 
own resources and judgements. Low scorers are persons who are group dependent 
frequent joiners and also good followers. | 


15. Casual—Controlled, Scale Q ;: Scale Q; is a measure of the ability to bind anxiety, High 
scorers are persons who are controlled, socially precise and compulsive. Their 
willpower is also strong. Such persons think carefully before they act and do not let 
emotions disturb routine. These persons are well-controlled and dependable. In a 
nutshell, such persons have sound mental health. On the other hand, low scorers are 
persons who are characterized by a lack of control, tendency to show disrespect and 
carelessness to social rules and tendency to follow one’s own urges. Such persons also 
possess the trait of overreactivity and they often fail to handle stress in a productive way. 
If a low Q, is found in the presence of some other anxiety indicators, it is enough to 
suspect that the person is having some kind of emotional trouble. 


16. Relaxed-tensed, Scale Q,: This scale measures free-floating anxiety and tension. High 
scorers on Q, tend to exhibit characteristics like tension, frustration and a highly anxious 
approach to life. Such scores indicate a person with extreme problems, a cry for help and 
showing socially undesirable tendencies. This scale, in fact, is the best indicator of 
neurotic anxiety. Low scorers, on the other hand, tend to exhibit low tension, low 
anxiety, and a relaxed approach to life. Such persons don’t exhibit frustration even In a 


situation that may rightly produce this. 
Test-retest reliability and internal consistency reliability for the 16 primary factor scales of 
the 16 PF are better. Short-term test-retest reliability coefficient for the 16 source [rails range from 
65 to .93 with a median coefficient of .83. Despite the scientific method used for deriving the 


factors, 16 source traits of the 16 PF do intercorrelate with each other, and Sores tnis 
correlation becomes as high as .75 (Cattell, Eber & Tatsuoka 1970). For — _ 
overlapping among the factors, 16 factors themselves were factor analyzed that yielded to 
second-order factors for which scores can be obtained (Kaplan & SaccuZzzo — ; 

From the 16 PF scale, some parallel inventory for ages from 12 to 18 and 7 a wl fas 
12 have been constructed. The former is known as the Junior Senior High mente din vibes 
Questionnaire and the latter is called the Children’s Personality peace econ a 
16 PF scale to the assessment of psychopathology, items relating to psyco. aa = me i" 
been factor analyzed, resulting in 12 new factors in addition to io yoepaigarbe: aeaene 
assess normal personality. These new factors together constitute P me ory the tele 
Called the Clinical Analysis Questionnaire OF CAQ (De nee a - 
abnormal] source traits assessed by the CAQ are presented in Table 10.2. 


198 Tests, Measurements and Research Methods in Behavioural Sciences 
Ps mS | f baci bs 


Table 10.2 Factors of CAQ 


D, Hypochondriasis 
D, Zesttulness/Suicidal disgust 
D, Brooding discontent 

D, Anxious depression 

D. Energy euphoria 

D, Guilt and resentment 

D, Bored depression 

P, Paranoia 

P, Psychopathic deviation 
Si Schizophrenia 

Ac Psychasthenia 

P, General psychosis 


Despite the best care, when 16 PF scale is compared with MMPI and MMPI-2, it 


| IS found: 
be less useful. Besides this, the claim of the 16 PF to measure the basic source traits of personality 
are also not much appreciating. Despite all these limitations, the 16 PF remains one of th: 


exemplary illustrations of factor analytic approach to personality testing. 


Eysenck Personality Questionnaire (EPQ) 


Based on a lifelong programme of factor 
developed a series of tests designed to mea 
Eysenck identified three major dimensions 
Neuroticism (N). The Eysenck Personality 


analytic questionnaire, Eysenck and Eysenck (1975) 
sure normal and abnormal dimensions of personality 
of personality: Psychoticism (P), Extraversion (E) ant 
Questionnaire (EPQ) comprises items that intend t 
measure these three dimensions of personality. The EPQ consists of 90 statements to be answer 
in terms of either ‘Yes’ or ‘No’ and is specially suited for persons aged 16 and older. ta 
incorporates a Lie (L) scale to assess the validity of the testee or examinee’s responses: ee . 
Junior EPQ is available for assessing these dimensions among children aged 7 to 19 a" 
consists of 81 statements. A brief description of these three scales are as under: 


rcesleamt (0 
(1) P Scale: P scale assesses the dimension of psychoticism which is not eel 
psychosis such as schizophrenia, although a schizophrenic is expected to score high thing’ 
It assesses traits like Poor concentration, poor memory, insensitivity, liking for ont it 
disregard for danger and convention, cruelty, lack of caring for others. Such pe 


a and 
: ei’ essive 
considered peculiar by others. A high score on P scale indicates impulsivity, 286" | 


5 ‘nds. Antisoe®” 
hostile traits, empathy defects and 4 preference for liking odd or unusual _ score o” 
personality and schizoid Personality often obtain high scores on this dimension. A tivity. A 
scale indicates some deri 


eyes a | sensi 
. vable characteristics like empathy and interpersonal se 
examples of items of P scale are: | 


Do you take risk just for fun? (T) | 
Do you often break the rule? (T) gsi 


, lar OPP’ 
, : wae ifs PO! iyill 
E scale assesses the dimension of extraversion and acl” ad 


? reference . ing 2" 
scores on E scale indicates tendency to be outgoing, P fun-l0¥! 
with other people 


ys are 
, desire for novelty. Such persons 


(2) E Scale: 
introversion. High 
involving contact 


ZA 


Measurement of Personality 199 


ys, Low scores on this scale indicate introverted traits such as preference for solitude and 


regario 
Such persons show tender mindedness, introspectiveness and seriousness. A few 


uiet activities. 
les of items of E scale are: 


Do you like plenty of excitement? (T) 


q 
examp 


Are you quiet when with others? (F) 

(3) N Scale: N scale assesses the dimension of neuroticism that includes traits like 
slowness In thoughts and actions, suggestibility, tendency to repress unpleasant facts, lack of 
sociability, below-average emotional control, willpower and capacity to exert self. A high score 
indicates that the person is nervous, maladjusted and overemotional and a low score indicates 
that the person Is stable and confident. A few examples of items of N scale are: 

Are your feelings easily hurt? (1) 
Do you feel dullness in life? (1) 
A major focus of research with the EPQ has been to find out the empirical correlates of 


ts opposite introversion, and such researches have linked several perceptual 


extraversion and i 
of the important such linkages are: 


and physiological factors to the dimension E-I. Some 
(i) Extroverts have a greater need for entertaining external stimulation. 
(ji) _Extroverts are readily conditioned to stimuli associated with sexual arousal. 


(iii) Extroverts are more suggestible than introverts. 


iy) Introverts are vigilant in watch keeping. 
(vy) Introverts’ performance on signal detection tasks are comparatively more improved. 
are less tolerant of pain but more tolerant of sensory deprivation, 

) are satisfactory. The one-month test-retest 
(L). The internal consistency reliabilities were In 
ales. The construct validity of EPQ is also well 
attentional, learning and therapeutic 


(vi) Introverts 

The psychometric properties of the EPC 

reliabilities were .78 (P), .89 (E), .86 (N) and .84 

the .70s for P. and the .80s for the remaining three sc 

established in several studies using emotional, behavioural, 
criteria (Eysenck & Eysenck 1975, 1985). 


The Theoretical Strategy 
are selected to assess those 


In the theoretical strategy to structured personality, testing items 
variables or constructs that are spec ified by a major theory of personality. After selecting the items 
and grouping them into scales, construct-related evidence for validity is sought. In fact, 


psychometricians have adopted the theoretical strategy 1n order to avoid the various types of 
disagreement and biases thal usually do stem trom factor analytic strategies. The theoretical 
Strategy requires that the items of the test must be consistent with the theory. lf the theory states 
that the personality can be broken down into five major dimensions then the test constructors 
would strive to write items that tax each of these five dimensions. In this way, this theoretical 
strategy attempts to create a homogeneous scale and to obtain this aim, they may, use various 
types of statistical techniques. The following are the major structured personality tests developed 


using theoretical strategy. 


e ds Personal Preference Schedule ( EPPS) , | 
Une of the earliest and best-known exam les of a theoretically derived structured persona ity test 
S the Edwards alae oe chef (EPPS), developed by Edwards (1954, 1959). The 
theoretical basis for the EPPS is the need theory of personality proposed by Murray and Morgan 
(1938), In developing the EPPS Edwards selected 15 needs from Murray’s list and constructed 
lems for content validity for each. Fach need is paired with other 14 needs twice on the test. 

lee Is one additional he nensiite Ol response consistency tends to provide a check on 
fandom responding. It consists of 15 repeated items. Testees who carefully complete the EPPS 


Je in Behavioural Sciences 
nts and Research Methods in Bebe 
200) Vests, Measurement Ke | 
of 15 items consistently. A 754 , lower a 9 is INdicat; ' 
ast 9 or more Das nvyconsisis of 21 Ouair.o Viilerwenes, . 
at lea The entire inventory consists Pa a Sin whic 
potential interpretive problems. saith iemnvem tha wie TA. The 0 ann 


- a : , ons 
iin a amig shat in Table 10.3. Within each pair, the testee iS forceg 
: c *n presented | | 

consistency scale have been p 


| J to 
| ii ant which is more characteristic of eee es ~ _" this forced. 
choose one such staten PS results in what is called ipsative score which tells that the Trength 
enoice sereclecd i i ar terms, rather in relation to the strength OF the testee’s Other 
each need Is oa ge “a paren result in relative terms rather than in absolute terms. Such 
; a : ete individuals against themselves nas es Produce data th 
aioe the relative strength of each need for that person (Popper 1 uN ins Way, each Pers 
provides his or her a Hen st a | ther 14 needs, the maximum possible raw SCOre gp 
since each need is paired twice with o 1x14 = 28, Needless to say, the lowes ese 
each scale for an examinee or testee would be 2 x e 
score on a need scale would be zero. 


Table 10.3 EPPS Needs and their brief description 


generally answer at le 


af 
On 


n Achievement: To do one’s best and accomplish something difficult 

n Deference: To conform to what is expected of one 

n Exhibition: To be the center of attention, to have others notice oneself 

n Intraception: To analyze the motives and feelings of oneself and others 

n Dominance To influence others and to be regarded as leader 

n Nurturance: To help others in trouble, to treat others kindly 

n Order: To make plans before starting on a difficult task 

n Autonomy: To be able to come and 80 as desired, to say or do what 
one wants 

n Affiliation: To be loyal to others, to Participate in iriendly groups 

n Succorance: To seek encouragement from others, to gain sympathy 

n Abasement: To feel 


guilty when one does something wrong, to accept 
the blame when things don’t go right 


To do new and different 
To keep at a task unt] iti 
To engage in Various typ 
(0 Zo out with members 


n Change: things, to experience novelty 


S finished, to work hard at a task 


€s of activities with opposite sex, 
of the opposite sex 


ry points of view, to tell others what one 


n Endurance: 
n Heterosexuality: 


n Aggression: 


Consistency: 






Raw scores on the EPPS are Converted 


into Percentile equivalents with the use of various 
types of norms table representing both college and | aes 
) MS Nd general ; ulations strength of 
EPPS, particularly in clinical use, ic a7. . “ie adult Populations. The stre 5 


; Casures qj ions of personality that ale 
nonthreatening to the client (Murphy & Davidshofer 1988) ensicions i pean: 


Despite its interesting and impressive features, the EPPS has been criticized and not well 
received by the reviewers (Heilbrun 1972). Studies have ver le .. itic shat the effects of 
social desirability on the test have not been eliminated or ean oon y's ie ie ve EPPS 
can be easily faked even after use of forced-choice technique Alike _—e y oo omalel Api 
for conversion of raw scores into Percentiles, the advisability of iecnaiad fay question 


ee eee eo 2 oe ee ee Eee 
a ee te El Y ae ol , ee ee ee 
a a re eg eae See ee de ae 


Measurement of Personality 201 


because of the ipsative nature of scores. Ipsative frame of reference ji 

intraindividual comparison whereas normative reference data are 8 MOS Suites for 
interindividual comparison. Since the EPPS combines both frames of ot ie ROnGIR: Sie 
interpretation of scores more confusing and less meaningful. | ces, it makes the 


personality Research Form (PRF) and Jackson Personality Inventory (JPI) 


Other efforts to use the theoretical strategy in constructing a structured personality test include 
the developmen of the Personality Research Form or PRF (Jackson 1967) and the Jackson 
Personality Inventory or JPI (Jackson 1976a, 1976b). Like the EPPS, the PRF and the JPI were 
based on Murray’s theory of needs. 

While developing PRF, Jackson had two major goals. First, he wanted to develop a useful 
tool for personality research and second, he intended to develop a test for assessing some general 
personality dimensions assumed to be of importance in environments like schools, colleges 

uidance clinics, industries, etc. The PRF has been designed basically to fulfil these two basic 
goals. Initially, the PRF was developed in two sets of parallel forms which differed in length of 
‘tems and number of scales. The shorter forms (A and B) consisted of 15 scales including one 
validity scale and 300 items, and the longer forms (AA and BB) consisted of 22 scales including 
two validity scales (Desirability and Infrequency) and 440 items. The validity scales, like MMPI, 
were added to detect possible scoring errors, careless test-taking attitudes and social desirability 
responses. Both sets of parallel forms have been used primarily with college students. This 
necessitated the development of a newer form (E) of the PRF, which, in fact, represented an 
extension of the test to other populations. Since it was developed later, it was possible to use a 
modified item-selection technique that permitted including all 22 scales of the other, longer 
forms to be converted into a shortened 352 items test. The names of all 22 scales appearing on the 
PRF are: abasement, achievement, affiliation, aggression, autonomy, change, congnitive 
structure, dependence, dominance, endurance, exhibition, harm avoidance, impulsivity, 
nurturance, order, play, sentience, social recognition, succorance, understanding, desirability 
and infrequency. Jackson has designed these 22 scales to be bipolar in nature meaning thereby 
that a low score on any scale may signify not just one absence of trait but also the presence of its 
opposite. For example, a high score on exhibition scale would mean that the respondent a a 
passive need to be conspicuous, dramatic and colourful in social situations. A low score on this 
scale would be indicative of a fear of group activities and active avoidance of such activities. | 

The Jackson Personality Inventory (IPI) which developed after the PRF neon" sella: 
though more refined test construction procedures, has a more appealing and practica oe a ol 
(lackson 1976b, 1994). The JPI has one form which comprises 320 true—false ov . : ceo 
including one validity scale for use with high school students through “ — : “te ae 

| ) : 2ve 
adults. The name of scales are: anxiely, breadth of interest, er ih OF a ene ered 
innovation, interpersonal affect, organization, responsibility, risk-taking, se y 


adroitness, tolerance, social participation, value orthodoxy and infrequency. | _ 
P| are that they are balanced in true-false keying an 


Other features of both the PRF and the J ‘adependent. On the whole, it can 
also have no item overlap. Therefore, os t wi ; F tistical procedures, these 
be said that by combining theory-base attemp truction of structured 


tests appear to have established a very scientific tre 
personality tests. 


The Myers—Briggs Type Indicator (MBTI) 


jot g 
The MBTI has been developed by Myers and Brige® ¢)°% i Psychological Types (Quenk 2000) 
a theoretically constructed test based on Carl Jung § theory ot Sy 


a wal indivi ~The MBTI is based upon 
and is widely used for assessment Of personality In nernaal LE geal (E and |) and his 
Jung’s well-known dichotomy of extroverted and jee San a oe doi 
Classification of opposing ways of perceiving (sensation ve 


nd in the cons 


62; Myers & McCaulley 1985). It ts 


ie 2 


202 Tests, Measu rements and Research Methods in Behavioural Sctences 


contrasting approaches to judgement (thinking vs feeling—T vs F). Besides these, | 
a polarity of preferences in orientation towards the outer world, (judgement or Perception; 
P) which, of course, was not clear in Jung’s theory. The scores on these four dimensions mh 
are presumably independent, results in sixteen possible “type formulas” that Present ¢ 
combinations of the letters of the preferred direction with each of the four dimensions as 
éxample, ‘INTP’ is one combination type that represents “introverted, intuitive, thinking an4 
perceptive type indicating a person who would be (a) quiet and reserved, (b) Enjoy iS 
problems with logic analysis, (c) interested mainly in ideas, and (d) tend to exhibit sharat 
defined interests (Myers & McCaulley 1985). Each preferred direction also has a NUMETICal score 

The basic aim of the MBTI is to determine where does an individual faj - 
extroversion—introversion dimension and on which of the four modes (that is, S, N.T and F) he 
relies. The underlying assumption of the MBTI is that every person has specific preferences in the 
way he construes his experiences and these preferences underlie the needs, values, interests and 
motivation. 


talso includ 


Review of studies have demonstrated the applications of MBTI. For exa mple, the MBTI has 
been used to study the relationship between personality and financial success (Mabon 1998), 
and sensitivity and different types of purposes in life (Doerries & Ridley 1998). Likewise, this tes 
has been successfully used in the study of leadership (Fitzgerald 1997), emotional perception 


(Martin et al. 1996), communication styles (Loffredo & Opt 1998) and career choices (McCaulley 
& Martin 1995). 


Measures of Self-concept 


Based upon a theoretical Strategy, some personality tests have been developed to assess 
self-concept which broadly refers to the set of assumptions a person has about himself or herself. 

several adjective checklists have been constructed for evaluating self-concept. In these 
checklists, a list of adjectives is presented and the subjects are asked to indicate those adjectives 
which apply to them. One such popular adjective checklist is Gough’s Adjective Checklist that 
contains 350 adjectives in alphabetical order from ‘absent-minded’ to ‘zany’ (Gough & Heilbrun 
1980). The adjectives are based on 15 Murray needs covered by EPPS and two specialized 
personality theories, namely, Berne’s (1961, 1966) Transnational Analysis theory and Welsh's 
theory of creativity and intelligence. The Student Self-concept Scale (SSCS) is another measure of 
self-concept based upon Bandura’s theory of Self-efficacy (Gresham, Elliott & Evans-Fernandez 
1993). The SSCS assesses three major domains of self-concept, that is, academic, social and 
self-image. Within each domain, the respondents have to indicate not only how confident they 
are regarding what is being asked by the item but also how important the items are to them and 
how confident they are that possessing certain attributes or doing something will lead to certain 
outcomes. The Personal and Academic Self-Concept Inventory (PASCI) is another pops 
measure of self-concept developed by Fleming and Whalen (1990). It is desi ened for high sshd 
and college students. This inventory attempts to investigate the hierarchical, multifaceted mode 
of self-concept developed by Shavelson and his colleagues. The PASC], in its current versio” 
consists of a global self-esteem scale and six additiona| facet scales. Of these six, two deal wi 
social aspects of the self-concept. The Pier—Larris Children’s Self-Concept is another eae 
self-concept that contains 80 self-assessment items and requires ‘Yes’ or ‘No’ response. Bes 
these, the Tennessee Self-Concept Scale is another measure of self-concept. This !s 4 forms 
paper-and-pencil test of self-concept. 


A relatively novel approach to assessment of self-concept is based on Rogers's theory OF* 


= in thls 

For evaluating self-concept, Rogers has recommended the use of Q-Sort technique: - 
. . E — ¥ é 5 : 5 so / ! 
technique, the person has to sort several self-assessments in piles from least to most per sie 


descriptive. He is asked to make two sorts of ca rds—one type of sort tends to describe W 


wt 


Measurement of Personality 203 


rson really is =". and the second type of sort describes what the person believes he or she 
should be (ideal self). Roger's theory obviously predicts that major discrepancies between these 
two types of self reflect low self-esteem and poor adjustment (Rogers 1961) 


Combination Strategies 

In modern times, the test developers are using a combination strategy in which they mix various 
pes of strategies for developing a structured personality test. Two most popular structured tests 

which rely upon combination strategies are the Million Inventories and the NEO Personality 


Inventories. 


The Million Inventories 

Million (1994) has developed a structured inventory designed to assess personality disorders. It is 
named as the Million Clinical Multiaxial Inventories (MCMI), originally published in 1977. 
Revised in 1987, it was named MCMI-II and currently again revised in 1994, it has been named 
MCMI-III. Thus, the Million Clinical Multiaxial Inventories consist of three versions of the same 
test the MCMI, MCMI-II and the MCMI-II. Following Piotrowski (1997), the Million Inventories 
rank second only to MMPI and MMPI-2 as a measure of pathological traits of personality. 
However, the MCMI-III has two vantage points over the MMPI-2. First, it is much shorter (that is, 
only 175 true-false items) and therefore, more palatable to clinical practices. Second, it is well 
planned and organized to identify the clinical syndromes in a manner that is compatible with the 
DSM-IV of American Psychiatric Association. 

The MCMI was originally published in 1977 and it comprises 24 scales for assessing 
pathological traits of personality. In fact, the construction of this inventory was deliberately 
undertaken to meet the criticisms of the MMPI and to maximally utilize intervening advances in 
the diagnosis of psychopathology. Soon some controversy arose regarding whether the MCMI 
provided a good measure of personality disorders as defined by the Diagnostic and Statistical 
Manual of Mental disorders (DSM-III). Consequently MCMI was revised and MCMI-II was 
published (Million 1987). In this edition, Million not only tried to accommodate the personality 
disorders of the revised versions of DSM-III] and DSM-IIR but also added two new scales to the 
original 24 scales. Obviously, the basic purpose of the MCMI-II was to help the clinicians make a 
diagnosis on the basis of Axis Il of the OSM-INR. Still, controversy continued over how much the 
MCMI-II really appeared to be an improvement over MCMI. Despite this, it remained in 


| use and following Million’s high influential theory of personality disorders which 


widespreac 
lation of the Axis Il personality disorder 


served as one of the conceptual basis in the original formu 
DSM-III, Some signiticant changes OCC urred in this direction. 


categories of 
was published in 1994, Million also revised 


When the new version of DSM, that is, DSM-IV 
MCMI-II and produced MCMEIII which comprises 175 briet, self-descriptive statements to be 


marked ‘true’ or ‘false’ by the testee. O1 these, 85 items are taken from MCMI-Il and 90 items have 
been taken from the first test (Groth-Marnat 1999). MCMI-II had two additional scales— 
Depressive personality scale and Post-traumatic stress disorder scale. In fact, MCMI-II is rooted 
in Million’s biopsychosocial views of personality functioning and psychopathology (Million et al. 
1996). The score profile of MCMI-II includes 24 clinical scales, apart from four modifying scales 
(cf. Table 10.4). All these clinical scales are grouped under four major categories, namely, 
Clinical Personality Patterns, Severe Personality Pathology, Clinical Syndromes and Severe 
Syndromes. Clinical Personality Patterns have 11 scales that closely coincide with the Axis II 
Personality disorders in the DSM-IV: Clinical Syndromes have 7 scales that measures Axis | 
disorders; Severe Syndromes and Severe Personality Pathology each have three scales. Clinical 
Personality Patterns and Severe Personality Pathology have scales which are designed to assess 
Axis Il personality pattern disorders of DSM at different levels of severity. Clinical Syndromes and 
Severe Syndromes have scales which are designed to assess the Axis | syndromes of DSM. In addition 
lo these four major categories, there are modifying indices which consist of four scales (Disclosure, 





204 Tests, Measurements and Research Methods in Behavioural Sciences 


Desirability, Debasement and Validity). Scores on the first three scales are used for adjusting the 
other score scales in the upward or downward direction, based on renienniesiten Or €xapgeration 
of symptoms, respectively. The Validity scale assesses the examinee s approach to the test. In fact, 
this scale consists of three items with unusual content and endorsement to them suggests that the 
examinee was either not paying attention, responded randomly or was being intentionally 
oppositional. 


Table 10.4 Scales of the Millon clinical multiaxial inventory-II 


a 


Clinical Personality Patterns Clinical Syndromes 

Schizoid (1) Anxiety (A) 

Avoidant (2A) Somatoform (H) 

Depressive (2B) Bipolar: Manic (N) 

Dependent (3) Dysthymia (D) 
Histrionic (4) Alcohol Dependence (B) 

Narcissistic (5) Drug Dependence (T) 

Antisocial (6A) Post-traumatic Stress Disorder (R) 


Aggressive (Sadistic) (6B) Severe Syndromes 


Compulsive (7) Thought disorder (SS) 
Passive-Ageressive (negativistic) Major Depression (CC) 
(8A) 
Self-Defeating (8B) Delusional disorder (PP) 
Severe Personality Pathology Modifying Indices 
Schizotypal (S) Disclosure (X) 
Borderline (C) Desirability (Y) 
Paranoid (P) Debasement (Z) 
Validity (V) 





Recently, Million has developed two new instruments that extend his approach to the 
assessment of personality and psychopathology. One is the Million Adolescent Clinical Inventory 
(MACI) and the other is the Million Index of Personality Styles (MIPS), MACI was intended to be a 
tool for clinical use in assessment of adolescents between ages 13 and 19 (T Million, C Million & 
Davis 1993). MACI has been primarily derived from the Million Adolescent Personality Inventory 
or MAPI (Million, Green & Meagher 1982) which was an order test designed for use in both 
clinical assessment as well as vocational counselling and academic advice. MIPS is primarily 
designed to act as a measure of personality for normal adults who generally seek assistance for 
family, work or social problems in various counselling situations (Million 1994). 


The Million Inventories have reliability and validity similar to other personality tests. Hence, 
they are psychometrically sound and are in widespread use. 


The NEO Personality Inventoy (NEO-PI-R) 


The NEO-Personality Inventory-Revised is one of the newest 
developed by Costa and McCrae (1992). The developers of 
analysis and theory in item development and test constructi 
Five personality factors: Neuroticism (N), Extraversion ( 
Conscientiousness (C). ‘Big’ refers to the fact that each 
specific traits. The Big Five are as broad and abstra 


major personality inventories 
this inventory have used both facto! 
on. This inventory measures the Big 
E), Openness (O), Agreeableness (A) and 
factor subsumes a large number of saci 
ct In personality hierarchy as Eysencks 


—_ 


Measurement of Personality 205 


ssupertactors Originally, Costa and McCrae (1992, 1995) had focused only on three factors like 
Neuroticism, Extraversion and Openness, thus the title NEO-Personality Inventory. Subsequently, 
they added the factors of Agreeableness and Conscientiousness to conform to the Big-Five factor 
model. They differentiated each of these five factors (or domains) into six more specific facets 
which were defined as more specific traits or components that make up each of the broad big-five 
factors (cf. Table 10.5). Each facet is assessed by 8 items. Thus the most recent NEO-PI-R consists 
| of 240 items (that is, 5 factors x6 facets x8 items). For each item, the testees indicate the 
h they agree or disagree, using a five-point rating scale. 


Table 10.5 NEO-PI-R facet scales associated with Big Five factors 


extent to whic 


Neuroticism: anxiety, impulsivity, angry hostility, depression, 
self-consciousness, vulnerability 

Extraversion: activity, excitement seeking, positive emotions, 
warmth, gregariousness, assertiveness 

Openness to actions, ideas, values, fantasy, aesthetics, feeling 

Experience: 

Agreeableness: altruism, trust, straightforwardness, compliance, 


modesty, tendermindedness 


Conscientiousness: | competence, dutifulness, order, achievement- 
striving, self-discipline, deliberation 


The NEO-PI-R has good reliability and validity across different data sources, and with other 
‘nstruments such as Goldberg’s (1992) adjective inventories. Conclusively, it can be said that the 
NEO-PI-R reflects modern trends in personality test construction by its dependence on logic, 
theory and liberal use of factor analysis and statistical approaches in test construction. It appears 
to be exceptionally promising for assessing a wide range of characteristics throughout the 


different cultures of the world. 

On the whole, it may be concluded that psychometricians’ attempt to construct various 
types of structured personality tests Is definitely praiseworthy. Of the structured personality tests, 
the Million Inventories, NEO-PI-R and MMPI-2 are considered to be the dominant tests of the 


present time. 


Self-report Inventories in India 
In India, several self-report personality inventories have been constructed. Some of the 
foreign-made tests have also been adapted to suit Indian conditions. Sohoni (1953) developed a 
test of temperament and character for high school children. The reliability of the test ranged trom 
0.44 to 0.54 and the validity coefficient ranged from 0.23 to 0.45. Singh (1967) constructed an 
adjustment inventory for college students. It measured adjustment in five areas—home, health, 
society, emotion and education—and had a total of 102 ‘Yes-No’ items. The internal consistency 
reliability ranged from 0.92 to 0.94. The validity coefficient of the inventory against Asthana’s 
adjustment inventory was 0.62. Bengalee (| 964) developed a Multiphasic Personality Inventory 
which was named the Youth Adjustment Analyser (YAA). The purpose of the inventory was to 
screen out maladjusted students from the college-going population. It covered five areas of 
Personal and social adjustment, namely, unhealthy parent attitudes, general home adjustment, 
aggressive behaviour, neuroticism and interests. For measuring the first two personality traits, two 
independent scales were developed—the Parent Attitude Scale for measuring unhealthy parental 
attitude and the General Home Adjustment Scale for measuring general home adjustment. The 
Parent Attitude Scale has five subscales, namely, dominance, acceptance, submission, rejection 


i, % 


and total parent attitudes. The General Home Adjustment Scale CONSIsts of 32 iter. 

Parent Attitude Scale consists of 35 items. Items of the former are available jp English 8nd the 
and Hindi languages. Prasad (1974) developed an adjustment Inventory fo HA 
measures parent adjustment, home and family adjustment, social adjustment a Whict 
adjustment and self-acceptance. The inventory has a total of 279 items and NOrms for 4.00na 
sections of the population. Singh & Sinha (1979) have developed a personality test Rowe fren 


| ccarch Methods tn Behavioural Sciences 
206 Tests, Measurements and Research Meth 


teenage, valhi 


Differential Personality Scale which measures nine personality traits, namely, shoe 
. . ata + + ° 7 : | ” 

responsibility, emotional stability, masculinity, friendliness, heterosexualit Sioa 

curiosity and dominance. This scale has been revised by Singh & Singh (2002) ary pee™ath 


7 ' nto hich rt f he ab has beg 
renamed as Differential Personality Inventory which, apart trom the above nine dimension: a 


includes the dimension of self-concept. The inventory has a total of 159 items jn Hind also 
meant for college students. However, it can be administered in Upper Classes at schoo] “hogs is 
test-retest reliability coefficient for the various dimensions of the scale ranged from 0,73 ~ Ne 
and the internal consistency coefficients ranged from 0.70 to 0.89. The validity Coefficients L _ 
different dimensions ranged from 0.55 to 0.84. The intercorrelations of all the ten dimension: 
were low and statistically not significant. 

There have been some adaptations of foreign-made tests to suit Indian conditions, Mohsin g 
Hussain (1981) adapted the Bell Adjustment Inventory (Students’ form) in Hindi. The Hind; 
adaptation of the inventory consists of 135 items and in its present modified form published jr 
1987, has only 124 items. The test-retest reliability coefficients of the four areas of the adjustmen; 
of the inventory ranged from 0.700 to 0.926 and the split-half reliability coefficient ranged from 
0.738 to 0.932. The validity coefficients of the four areas of adjustment ranged from 0.272 to 
0.785 against the Neuroticism Scale of the Hindi adaptation of Eysenck’s Personality Inventory, 
and from -0.088 to -—0.255 against the extroversion scale of the same inventory. Bell's 
Adjustment Inventory has also been adapted in Hindi by Saxena (1959) for the age range of 
11-20 years. Singh & Jamuar (1971) adapted the Maslow Secu rity-Insecurity Inventory in Hindi. 
There are 70 items in the adapted inventory. The test-retest reliability coefficient was 0.79 and the 
split-half reliability coefficient was 0.86. Percentile norms were developed separately for male 
and female students of the BA Part-I class. Singh (1972) adapted the Maudsley Personality 
Inventory in Hindi. The test-retest reliability coefficient for the E scale (Extroversion) was 0.77 
and for the N scale (Neuroticism) was 0.82. The inventory was validated against the 
Guilford-Zimmerman Temperament Survey and several other criteria. The validity coefficients 
were satisfactory and statistically significant. 


OVERCOMING DISTORTIONS IN SELF-REPORT INVENTORIES 


Distortions in the actual responses of self-report measures of personality are major see - 
users of personality tests. Hence, it is essential that attempts be made to overcome I = 
distortions and make the self-report responses more representative of the true responses. '™ 
following measures may be adopted for the purpose: 


Establishment of Rapport 


In any personality test, distortions occur when the testees feel ‘discomfort’ and find thea 
an unfriendly environment. It is, therefore, essential that before actual administration oid 
self-report inventories the tester should make every effort to establish a warm and co = a large 
relationship called ‘rapport’ with the testees. The development of such a relationship a After 
extent, dependent upon the tester’s skills and his subtle modification of the testing > etait the 
a rapport Is established, the testees will express the truth in an unhesitant way, thus ' 

major proportion of distortions. 


es in 
f the 
tive 





Measurement of Personality 207 


Use of Forced-Choice Technique 


The forced-choice technique has been used in controlling faking-good or socially desirable 
response Sets. In forced-choice items, the subject is forced to choose between two or more than 
two equally desirable or undesirable terms or phrases or statements. The subject who wants to 
ive socially desirable responses is outwitted by the forced choice between equally desirable 
statements. Several psychologists have used the forced-choice items in controlling faking-good 
tendency but the results have not been encouraging (Anastasi 1968). Probably, the reason is that 
forced-choice items have some defects. For example, such items require more time to obtain an 
equal number of responses. Reliability is also decreased in case of such items because the choice 
becomes difficult and when reliability is decreased, it may also offset the gain in validity. 


Concealing the Main Purpose of the Test 

When the subjects do not know the real purpose of the test, it becomes difficult for them to fake, 
although in such a situation they may be more suspicious and defensive in their responses. 
Subjects may guess from the nature of items but then, they may not be definite that some 
inferences about their tendencies or traits are to be made and this will naturally lessen the 
probability to fake. 

There can be two ways through which concealment can be done. One method is to state 
such a plausible purpose of the test which, in fact, is not the real purpose. For example, a 
personality test, with utmost care and effort, may be described as a test of ability, and if it appears 
<o to the examinees, the faking is likely to be reduced to a greater extent. Another method of 
concealment is to insert information which is actually false among items of information which 
are actually true. For example, the students may be asked to endorse those titles of the book in the 
booklist which they have gone through. This list of titles will also contain some fictitious titles. 
The greater the number of endorsements of such fictitious titles, the higher the deceit or boasting. 

Theoretically, concealing the real purpose of the test may appear a good means for 
controlling faking but it may not be considered desirable on ethical grounds because it may 
produce the impression among people that psychologists are tricksters, 


Use of Verification And Correction Keys 

Some psychologists have recommended the use of various kinds of correction and verification 
keys which give an indication whether or not an examinee is presenting a true picture of the self. 
For example, the famous MMPI uses four validity scores which aim at checking carelessness, 
evasiveness, misunderstanding and operation of other response sets. Likewise, the Edwards 
inventory uses several forced-choice pairs of statements of which 15 pairs are presented for the 
second time at random intervals within the test. A higher degree of inconsistency in choice of 
responses on the two occasions indicate the operation of some kind of response set and gives a 


contused picture of the self. 


SITUATIONAL TESTS 
A situational test represents a sort of compromise between a standardized test and observational 
methods of assessing personality. As its name implies, personality traits are measured on the basis 
of observations of ratings of what a person thinks and does ina given situation, which resembles a 
real situation of everyday life. The person concerned has usually no idea that he is being 
“xamined. Ordinarily, the situation represented by such tests is a social situation having 
PPortunity for interaction with other individuals and is specially designed to emphasize those 
“pects of personality which are under study. | 


Sua first situational test of personality was deve 
ices (OSS) during World War II in an effort to 


loped by the United States Office of Strategic 
screen out men for military assignments. 


SOLA LELLEU LS ees DLR iebee 2 pee! 
4 
eee —eeeE———————eeee 








Situational tests are more suited to the measurement of uate like leadersh; 
A ea sion-introversion and the like. Sometimes these tes 
Ry, SAREE ITE asis of assessment of the traits. Such cits. 
observable units of behaviour as the basis o assess A : aS. : UCN situational tests 5 
called behavioural tests because they are directly cancers wis Observable > ate 
Honesty, self-control and co-operation are such traits and all these traits may be 
the head—character,. One of the first attempts to study these traits was made b 
& Shuttleworth (1930) in their Character Educational Inquiry (CE]). 
behavioural tests utilized natural situations lying within the day-to-day rout 
such as games, classroom examination, etc. The children placed in these 
aware of the fact that they were being studied. The CEI tests were prin 
measure behavioural traits like altruism, honesty and self-control. However, 
were concerned with measuring honesty among the children by provid 
cheating. These tests utilized different modes or techniques for studying h 
technique, called the duplicating technique, the children were administered 
Classroom tests like the arithmetical reasoning test or the vocabulary test. 
children’s responses unknown to them was prepared by the experimenter. 
administration, the original test was again given with a req 
the help of a scoring key. A simple comparison of the 
responses revealed whether or not the children had cha 
whether they had cheated or not. Other CEI tests, intended to measure honesty, provided 
situations in which the person had an opportunity to lie or to steal! something. 
Likewise, the self-control through CEI 


iM | In 
'S utilize 4: 


y Hartshrone, 
In BENeral, the 
INE OF a schoo} Child 
Situations Were no) 
Cipally designeg to 
Most of the CF test 
Ing Opportunity for 
onesty. In one Such 
one of the common 
A set of duplicate 
In the subsequent 
uest to score their own responses with 
responses scored with the duplicated 
nged their responses in scoring, that is, 


complex task. One such task involved th 


(or run) together. Running over of the words together naturally created reading difficulty and the 


children’s self-control was measured by the length of the time devoted to reading the mixed 
words. One example of such lines in which words were run together is given below: 


THEKINGORDEREDTH EKNIGHTTOBRINGTH ECULPRITBEFOREHIM 


In fair evaluation to the situ d that although they represent real-life 
situations and therefore their findings can be easily generalized to natural life situations, they 
have several] important limitations. First, situational tests are extremely time-consuming, costly, 
and laborious techniques. They are time-consuming because ordinarily observations In 
contrived situations last for several hours; they are costly because they demand the services 0 
trained observers; and they are laborious techniques because creating a real-life situation 
involves 2 good deal of labour on the part of observers. Second subjectivity and bias Mey 
operate in the observation of the situational tests. Observers, though professionally trained, may 


certain bi individuals t0 

ion | Cause of certain bias towards the individua : 
— — other subjective elements may enter into his observation. Although an 
et - _ to control such Subjective elements, they cannot be completely eliminat . 
the _~ r Sin . human being. Third, for increasing the reliability and ae 
sali erie sein — situational tests, the experimenter emphasises on studying f 

eenaviour. The problem arises oe eae | ing can be attac 

to such an isolated b P M arises as to what significance or meaning can! 


it of behaviour Fc Ble: | ‘cular situation 
or on | - FOr example, let. i eh a particular § 

6-year-old Mohan hits 6-year-old Sohan What is = ae tg 
agsression of Mohan is t 


; sent 

| he meaning of this hitting? Does It em 

sohan? Or does it rer i — manent EGes it represent an — of dominance ot mee 

if Wiccdienam.. “i : nta sense of >Uperiority of Mohan over Sohan? We are unable Sohan. 

Thus, the problem of FU comin 'N relation to other types of behaviours of Mohan ae efore ¥ 

in a situational test. F ening the Meaning of an isolated bit of behaviour is always » annot 
Pourth, “What to observe’ is also a problem in situational tests, which ¢ 


ational tests, it can be sai 


Measurement of Personality 209 


be planned to study all behaviours of the individual. Only the limited behaviour should be 
selected for observation. In such a situation, determination and selection of a medningful set of 
behaviours to be observed is always an important problem. Lastly, whether the observer should 
be kept visible or invisible is also a problem in most situational tests. If the group is small, the 
physical presence of the observer in the group may change what actually takes place in the 
group. How can the observer be fitted into such a setting is always a problem. All these 
behavioural methods, including the situational tests, convey the full meaning about the traits to 
be assessed only when they are followed by other standardized tests. Merely on the basis of 
observational methods, it is difficult to arrive at a particular conclusion. Thus, observational 


methods may be regarded as only stop-gap arrangements to be replaced by some other more 
appropriate methods in the future. 


MEASUREMENT OF INTERESTS, VALUES AND ATTITUDES 


No programme of assessment of personality would be said to be complete and coherent unless it 
pays due attention to the measurement of interests, values and attitudes of a person. 


Meaning and Types of Interest Test 


An interest may be defined as a preference for one activity over another. This simple definition 
reflects two points. Firstly, interests involve the selection and ranking of different activities along a 
like-dislike dimension. One student for example, may select three activities like reading a novel, 
listening to radio broadcast and watching television and rank them 3, 2 and 1 respectively. 
Obviously, here he prefers watching a television to listening to a radio broadcast and reading a 
novel. Secondly, interests involve activities or behaviours indulged in by individuals. Interests are 
often expressed by action verbs as reading a novel, listening to a radio broadcast, planting 
flowers, watching television, etc. 

Based upon methods employed to evoke responses, Super & Crites (1962) have classified 
interests into three types— 

(a) Expressed Interest 

(b) Manifest Interest 

(c) Inventoried Interest 

Expressed interest, as its name implies, is the most simple and direct way of obtaining 
information about interest and is defined as an interest whenever a person states his preference 
for one activity over another. A teacher may express his interest in reading textbooks over 
reading novels. 

An interest is made manifest when a person voluntarily participates in an activity. A student 
who attends Yoga classes clearly demonstrates interest in Yoga and this is his manifest interest. 

There is no necessary relationship between expressed interest and manifest interest, though 
in many situations, they tend to coincide or overlap. Most of the individuals engage in some 
activities which they claim to dislike and just on the reverse, most people may refuse to engage in 
the activities which they claim to enjoy. 

Inventoried interests are those interests which are measured by interest tests that compare 
interest in different activities. As we know, most of such inventories measure only a limited 
sample or set of interests. The interests assessed by such inventories are called inventoried 
interests. 


In the measurement of interests, investigators have concentrated primarily upon vocational 
Pursuits. The most obvious approach to the measurement of interests of a person is simply to 
directly ascertain his preferences. But experience has shown that the responses given towards 
such direct questions are often unrealistic and unreliable. This is particularly true when the 


ae 
eo aE) . 


210 Tests, Measurements and Research Methods in Behavioural Sciences 
ests, | 


interest inventory is being administered to young args and a iene are ty reason 
this. First, young people or adolescents are not a y a of the ~ erent ACtivitieg tie 
required in a particular occupation. As a consequence, they are ural € to Jud ew ether 
would like or dislike a particular activity in a eIVE OCCUpatION OF area Of Activity, Secong th y 
preferences for occupations are guided by stereotypes which are formed On the basi. of he 
information released by popular mass media like television, radio, magazines, MOVies, etc Bur 

reality, things may be quite different. For example, the life of a doctor, 4 professor or 5 engines 
may be reflected in a particular way to suit the motives of the media. For the FeasONs stated abo , 
it was realized that a more indirect method of assessing interest should be eVolved ad. e, 
consequence of this line of thinking, investigations by different Psychologists were undertake 

Thus several standardized interest inventories were developed. n. 


In America, the first interest inventory was introduced in 1921 and it was Called 
Interest Inventory. Since then, several types of interest inventories have been develo 


these interest inventories became famous throughout the world. Here we shall 
famous interest inventories 


ped. Some of 
review such 


The Strong Interest In ventory (Sil) 


The Strong Interest Inventory was first constructed by EK Strong, Jr., shortly after World War |, that 
is, around 1920-21. Then it was named as the Strong Vocational Interest Blank (SVIB) which was 
published in 1927, Subsequently, it was revised several times. In the revised 1966 Version of the 


SVIB, 399 items were related to 54 occupations which were meant for men, A separate form 
Presented 32 different occupations for women. 


Beginning in the 1970s, extensive innovations were introduced and implemented in 
successive revisions of SVIB (Campbell 1974: Campbell & Hansen 1981: Harmon et al. 1994), 
The following three types of principal changes were introduced: 

(a) The introduction of a theoretical framework using Holland’s Hexagonal model of 

personality expressed through vocational choice 

(b) The merger of the earlier men’s 

and female samples 


(c) A substantial Increase in| number of se 
requiring less than a college degree 

The 1974 version of SVIB introduced by Campbell is Called the Strong—Campbell Interest 
Inventory (SCII). In its current form, the SCI| Consists of 325 items and is divided into seven parts 
(Hansen 2000), The seven Parts of SCII are: Occupations (13 items), School subjects (36 items), 
Activities (81 items), Amusements (39 items), T yPes OF people (24 items), Preference between two 
activities (30 items), Your characteristics (14 items). In the first five sections the examinee records 
his or her preferences by Marking ‘Like’, ‘Indifferent’ and ‘Dislike’. Items in these five parts fall 
Into categories like occupations, schoo} subjects, activities, leisure activities and day-to-day 
contact with various types of people. The sixth part requires the examinee to express prekerenes 
between paired items (such as dealing with things versus dealing with people). The sever 
section asks the examinee to Mark a set of self-descriptive Statements ‘Yes’, ‘No’ or ‘?’. al 
els of scores, differing in breadth. The broadest levels are the six = 
Occupational Themes scores, the next includes 25 Basic Interest Scales and the most spe" 
level provides the 211 available Occupational Scales. The six General Occupational T 
were based upon Holland’s theory of vocational choice. After many years of hectic ance 
Holland had postulated that interests express Personality and that people can be classifies es 
one or more of six cat 6 to their interests. The General Soman ee (5), 
Jel are na Ss Realicti igative tistic (A), d, 
Enterprising (E) and Conventional Cte poe tei ASES moll of Holland 
The interest patterns of Holland’s Six Personality factors may be explained in Table 10.6. 


and women’s forms and refe ming them on new male 


ales for vocational/technical occupations 





Measurement of Personality 211 


Table 10.6 Interest Pattern of Holland’s six Personali 


: ty factors 
Realistic (R theme): 





Such persons en 


| JOY Outdoor activities and technical 
Materials. 


Investigative (theme): Such PErsons are interested in science and the process 
of detailed investi gation. 

Artistic (A theme): Such persons Enjoy self-expression and like to be 
dramatic. 


Social (S theme) Such persons are interes 


, ted in providing help to others 
and in activities involv; 


ng other people. 


SUCH persons are interested iN power and political 
strength, 


Enterprising (E theme): 


Conventional (C theme): Such persons have clinical interests and like to be 


well organized. 
Each theme characterizes not onl 


y a type of person but also the type of working environment 
that such a person tinds most « onvenient and congenial. 


The second level of « ores is for administrative indexes. In fact. these 


importance to the testees and are needed only to ensure 
administration and s¢ oring of the test. 


are of less personal 
that errors were not made in 


The third level of scores provides a summary ot a person's basic interests. These scores are 


designed to provide specitic intormation about the ‘likes’ and ‘dislikes’ of the 
example, they may suppest whether 
scence, and mechanical activities 


testees. For 
a person scored high, low or about average in preterence for 


The final level ot scores js lor occupational scales. In fact. these scal 
place in the test profile, which shows the person's score for each of | 
broken into six general Occupational themes 
from the general theme and basic interest sca! 
lestee’s oy respondent's score with the 


&S OCCUPY an important 
24 occupations that are 
The scoring tor occupational scales are different 
es because the occupational scale compares the 


scores OL people working in different professions. The 
Beneral theme and basic interest scales tend to ¢ ompare the respondent's score with those of 


People in peneral. A rex ent innovation on SIs the addition of Personal Style Scales (Harmon et 
al. 1994). These scales basically aim to assess preferences lor broad styles of living and aiiael 
The four personal style scales are 
(i Work Style Scale: | igh) score on this scale indicates a preference to work with people 

and low:< Ofe Indicates an interest in ideas. data and things 

7" innilg Eivinirenend Scale: High score indicates a prelerence for academic learning 
eNironments and a low score indic dles preference lor applied learning activities. 

(iij) Leadership Style scale: | figh score indicates Comfort in taking charge of others and low 
‘Core indicates un 

V) Risk ta 
dVenturoys acti 

All these 
“OMfort with 


easiness in-such activities. 


king/Adventure scale: High score indicates a preference for risky and 
vities and low score indicates preference for safe and predictable activities. 
four personal style scales help in vocational guidance by displaying level of 
distinctive styles. 

Research 
"ArlOUs teste 
described in 
liabilities 


es have provided sufficient evidence to the fact that the interests measured by the 
are stable (Kaplan & Saccuzzo 2001). All scores on the trong Inventory are 
lerms of standard scores with a mean of 50 and standard deviation of 10. The 
48 Well as concurrent and construct validity of the test are highly satisfactory. 


212 Tests, Measuremenis and Research Methods in Behavioural Sciences 
i eS J, 4 . 


The Kuder Occupa tional Interest Survey (KOIS) | ™ | 
The Kuder Occupational Interest Survey (KOIS) — a ce Pg (0 provide 4 
unique alternative to SCIl. The earliest version © od. and | Ruder’ Preferenc, 
Record-Vocation in which forced-choice triad items were used an t ee indicateg 
which of the three activities they would like most and which least. In this original version the 
scores were obtained not for specific occupations but for 10 broad interest areas, namely 
Outdoor, Mechanical, Computational, Scientific, Persuasive, Artistic, Literary, Musical, Social 
Science, and Clerical. Subsequently, this inventory Was revised and a downward extension of the 
Kuder Preference Record-Vocation was made. This revised inventory was named the Kuder 
General Interest Survey (KGIS) which was designed for Grades 6 to 12, using simpler language 
and easier vocabulary. A still later version of KGIS appeared and was called the Kude; 
Occupational Interest Survey (KOIS) and it provided scores with reference to specific 
occupational groups as does the Strong Inventory (Kuder & Diamond 1979; Kuder & Zytowsk 
1991). Unlike Strong, on the KOIS, the respondent's score on each occupational scale js 
expressed as a correlation between his/her interest pattern and the interest pattern of the 
particular occupational group. The KOIS now provides both occupational scores and 10 broad 
homogeneous basic interest scores called Vocational Interest Estimates (VIE) which are, in fact, 
percentile scores derived from short scales equivalent to 10 interest area scores of the early Kuder 
Preference Record. They can also be converted into R-I-A-S-E-C theme of Holland by 
establishing direct correspondence for some scales and by averaging percentiles on two or three 
Kuder scales for others (for example, average of Kuder Artistic, Literary, and Musical for Holland's 
Artistic theme). 

Studies show that the psychometric properties ot KOIS are very satisfactory. Short-term 
reliabilities tend to be high (between .89 and .95) and also, evidence suggests that scores remain 
stable for as long as 30 years (Zytowski 1996). Reviewers have however, pointed out the dearth of 
the predictive validity data of this scale. Apart from this, the inventory also fails to address the 
effects of its forced-choice format on scores (Herr 1989; Tenopyr 1989). 


Jackson Vocational Interest Survey (JVIS) 


The Jackson Vocational Interest Survey was developed by Jackson (1977). It consists of 
forced-choice type items where the respondents must indicate their preference between (wo 
equally popular interests. Rational classification of vocational interest items were of two types 
One set was defined in terms of work roles and the other set was defined in terms of work styles. 
Work roles relate to what a person does on the job and work styles refer not to job-relate 
activities but to preferences for particular working environment or situations in which a certalt 
kind of behaviour is expected. Examples of work styles include independence, playfulness, alc, 
which were, of course, related to a person’s values either directly or indirectly. | 

JVIS has been revised in 1995 and the final form contains 34 basic interest scales, covering 
26 work roles and 8 work styles. It is designed in such a way that it could be equally applicable ‘. 
both sexes, although separate percentile norms for male and female are available. F 

Today JVIS is basically used for the career education and counselling of high school 2 
college students. It is also used in planning careers for adults. It contains 289 statemer 
describing various job-related activities. It takes about 45 minutes to complete and from 34 a 
interest scales, scores in 10 General Occupational Themes are derived: Expressive, Logic’ 
Inquiring, Practical, Assertive, Socialized, Helping, Conventional, Enterprising: 
Communicative. : | | 

Psychometric properties of JVIS are sound. Reliability for ten occupational themes !5 ae 
-89 and the test-retest reliability of 34 basic interest scales ranges from .84 to .88. Validity Te 
have suggested that JVIS predicts the interest pattern of university students better than any ° 
measure. JVIS has both hand-score forms as well as machine-score forms. 


D . 


Measurement of Personality 213 


The Career Assessment Inventory (CAI) 


The Career Assessment Inventory (CAI) had been first developed by Johansson (1976) and 
currently revised in 1985. It is based on the Strong Inventory. It is designed specially for persons 
who seek a career but don’t have college degree or any advanced professional training. It 
basically concentrates on skilled trades, technical work and various types of service occupations. 


The CAI has 305 items which are grouped into three content categories: Activities, School 
Subjects and Occupations. In each item, there are five response options ranging from ‘like very 
much’ to ‘dislike very such’. Although CAI is written at a sixth-grade reading level, it can also be 
used with adults who have poor reading skills. It provides scores on three major types of scales, 
including Holland's six General Theme scales, 22 homogeneous Basic Interest Areas scales and 
89 occupational scales (Kaplan & Saccuzzo 2001). Administrative Indices and four 
nonoccupational scales have also been included. 


Self-Directed Search (SDS) 


The Self-Directed Search (SDS) has been developed by Holland (1985, 1992) whose hexagonal 
model of general occupational themes has attracted wide attention. It is a unique interest 
inventory in the sense that it is a self-administered, self-scored and self-interpreted, vocational 
counselling instrument (Spokane & Catalano 2000). The SDS contains 228 items. A set of 66 
items, grouped into six scales with 11 items each, describe activities. A set of another 66 items 
assess competencies which are again grouped into six scales of 11 items each. Occupations are 
evaluated by six scales of 14 items each. Self-estimates are obtained in two sets of six ratings. The 
respondent fills out the SDS, scores the responses and calculates six summary scores, 
corresponding to the themes of the Holland model. These six summary scores can be used to 
obtain codes that reflect the highest areas of interest. The SDS also attempts to simulate the 
process of counselling by allowing respondents to allow to list occupational aspirations, then 
indicate occupational preferences in six areas and ultimately, rate abilities and skills in these 
areas. Using the SDS, the respondents can also build a meaningful persona/ career theory which, 
going beyond interests, includes readiness for career-decision making and readiness to obtain 
suidance (Reardon & Lenz 1999). Recently, the SDS has also acted as a link to an occupation! 
finder. In the 1994 edition of the SDS, the respondent can locate over 1300 occupations and 
match his interest code to corresponding occupational choices. 

Of these models or interests, Kuder’s model seems more suitable for Indian conditions. Singh 
(1965) adapted the Kuder Preference Record (Vocational) in Hindi; Naik (1969) adapted the 
same form in Oriya; Parikh (1971) adapted the vocational form in Gujarati; and Gopalan (1972) 
adapted it in Malayalam. Trivedi (1969) constructed an interest inventory for undergraduate 
students. Palsane (1975) also constructed an interest inventory in Marathi for school and 
college-leaving students. 


Value Tests 


Value tests intend to measure generalized and dominant interests and in this way, they are 
different from interest tests, which intend to measure interest in specific activities or areas. Value 
tests are mainly concerned with preferences for ‘different ways of life’. In general, values refer to a 
shared, enduring belief about ideal modes of behaviour or end states of existence. Values shape 
attitudes, instill actions and determine efforts to influence others. Values also arise in response to 
the societal and cultural conditions. Psychologists have paid comparatively little attention 
‘owards the development of value tests as Compared to interest tests. As a consequence, no 
satisfactory instruments for assessing human values are available. However, this does not mean 
(Nat no tests of values have been constructed so far. A very popular measure of values is the Study 
_ hia (SOV) developed by Allport, Vernon & Lindzey. The study of values is based upon the SIX 

“B9rles of values, which were originally suggested by Spranger’s Types of Men (1928). 


a 





214 Tests, Measurements and Research Methods in Behavioural Sciences 


The six categories of values are: theoretical (T), economic (E), aesthetic (A), social (S), political (p) 
and religious (R). According to Spranger the generalized and dominant interest of the theoretical 
person ts the discovery of truth; the generalized and dominant interest of the economic man is in 
what is useful and practical; the aesthetic man is interested in giving the highest value to form and 
harmony and to the enjoyment of each unique experience gracefully; the dominant interest of 4 
social person is love of people; the political person places the highest value On power and 
influence and the religious person places the highest value on the unity of all experiences in a bid 
to understand the cosmos as a whole. Items in the study of values present problem situations, 
under each of which the person is asked to choose either from two alternatives or from multiple 


alternative responses. The responses become the index of the relative strength of the Six 
basic values. 


Readers should not assume that a person can be wholly placed under any one of the above 
Categories. In reality, a person may be classified under more than one category and subsequently, 
his most dominant interest (that is value) may be determined. Studies have revealed Sex 


differences: Male adults tended to shift toward a T-E-P profile whereas female adults tended to 
shift toward an A-S-R profile. 


Rokeach (1973), accepting a debt to Allport and the study of values, developed another 
important value test called Rokeach Value Survey (RVS). This test incorporates two broad kinds of 
values: Instrumental values and Termina| values. Instrumental values refer to the derivable modes 
of conduct whereas terminal values refer to the desirable end states of existence. Ambition is an 


example of an instrumental value and family security is an example of terminal value. The RVS 
assesses 18 instrumental values and 18 terminal values. All these 36 values are listed in Table 10.7. 


Table 10.7 The 36 values constructs of Rokeach value Survey (RVS 























Terminal values 
a. A comfortable life 10. Inner Harmony 
2. An Exciting life 11. Mature love 
3. Asense of Accomplishment 12. National Security 
4. A World at Peace 13, Pleasure 
5. A World of Beauty 14. Salvation 
6. Equality 15, Self-respect 
P; Family Security 16. social Recognition 
8. Freedom 17. True Friendship 
9, Happiness «18. Wisdom 
Instrumental values 
Ts Ambitious 10. Imaginative 
2. Broadminded Ti: Independent 
a. Capable 12; Intellectual 
4. Cheerful 13. Logical 
5, Clean 14. Loving 
6. Courageous 15; Obedient 
rie Forgiving 16. Polite 
8. Helpful ies Responsible 
9. Honest 18. Self-controlled 





Measurement of Personality 215 


The examinees are asked to rank separately the 18-terminal and 18-instrumental values 
based on their importance for self as the guiding principles in life. The rank for each item 
becomes the score for that value. Ties are not allowed and therefore, value scores will range from 
1 to 18. The lower scores indicate greater importance. 

The RVS has some limitations. First, the survey possesses marginal reliability which 
automatically means that the test should not be used for individual guidance. Second, the survey 
ignores several other important values like physical well-being, individual rights, thriftiness and 
carefreeness. Third, individual values are not defined in detail and Rokeach has shown excessive 
dependence upon a single item for each value instead of using multi-item indices for the different 


value constructs. 

Some value tests have also been developed in the Indian condition. Prasad (1968) has 
developed an inventory of vocational values for undergraduate and the postgraduate students. 
The inventory covers nine areas such as altruism, physical conditions of the work, power, 
prestige, security, economic returns, self-enhancement, social climate and tradition. The 
inventory has a total of 72 pair items. The test-retest reliability coefficient of the inventory ranged 
from 0.80 to 0.92 and the internal consistency coefficients ranged from 0.90 to 0.92. Bhatnagar & 
Tandon (1975) and Verma (1972) have adapted the Allport-Vernon-Lindzey Study of Values in 
Hindi. Nazre (1968) adapted the Survey of Interpersonal Values (SIV) in Hindi. Value tests based 
upon the Spranger Type have also been developed in Hindi by Singh & Sharma (1975) and by 
Ojha (1975). Chauhan & Tiwari (1975) also developed a value test (dimensional), which 
measures cosmopolitanism vs localism, democraticism vs aristocratism, scientism vs fatalism, 
venturesomeness vs nonventuresomeness, etc. 


Measures of Attitudes 

An attitude is best defined as an enduring system of the cognitive component the feeling 
component and the action tendency component, all of which centre round an object, person, 
event, etc. The cognitive component relates to belief regarding the object of attitude, the feeling 
component relates to the emotions regarding the object of attitude, and the action tendency 
component (also known as the behavioural component) relates to the action or behavioural 
readiness associated with the object of attitude. It is obvious from the above interpretation that an 
attitude has a well-defined object of reference. The degree of a person's attitude may vary from 
favourable through neutral to unfavourable. 

There are numerous measures of attitudes. One of the most popular measures of attitudes is 
the self-report inventory where there are a large number of statements, which directly ask the 
persons what their attitudes are. Such self-report inventories are known as attitude scales, Thus 
attitude scales consist of a set of statements to be responded to. It is Common for a person to 
hesitate when he is asked to express his attitude directly. Not only this, sometimes he gives it a 
social colour, too. In order to remove these clifficulties, some indirect measures of attitude have 
also been developed. 

In the measurement of attitudes, the following two underlying assumptions are basic: 


1. In any study of attitude measurement, it is assumed that an individual’s behaviour with 


respect to the object of attitude will be consistent from one situation to another. If, for example, 
an individual dislikes co-education, he would continue to dislike co-education, in all situations. 
If Consistency is not exhibited, it is difficult to assess the individual’s attitude. 

2, Attitude cannot be observed directly. It ts, therefore, assumed that it must be inferred from 
the statements and actions of a person, that is, it must be inferred from behaviour—verbal or 
nonverbal—of the person. The statements and actions towards the object of the attitude reveal 
im degree of favourableness or unfavourableness, which together constitute the valence of 

e attitude, 





216 Tests, Measurements and Research Methods in Behavioural Sciences 


As mentioned earlier, there are numerous direct and indirect measures of attitude. 
Campbell (1950) has categorized them under four major headings. 


Nondisguised Structured Tests 


These tests attempt to measure attitude on the basis of direct statements regarding the object of 
attitude. The Thurstone Scale, Likert Scale, Guttman Scale, etc., are a few examples of this 
category. (A detailed discussion of these various attitude scales is presented in Chapter 13.) 


Nondisguised Nonstructured Tests 
Under this heading, the following methods of assessing attitude are included: 


1. Survey interview- Survey interview is another nondisguised nonstructured technique to 
measure the attitude of a person. Survey interview aims at interviewing as many persons as 
possible in a short time. Interviewers usually put any of two types of questions—fixed-alternative 
questions and open-ended questions. In the fixed-alternative questions, the person must choose 
his answer from the given alternatives. No provision for other categories of answer is made. Two 
examples illustrating the fixed-alternative questions are given below. 

\a) Which party do you think would do a better job of improving the law and order situation 
in the country in the next few years? (i) Congress (I) (ii) BJP (ili) RJD 

(b) Which party do you think has the largest number of honest leaders? (i) Congress (I) (ii) BIP 
(iii) CPI 

The open-ended questions have no prescribed alternatives. Here persons are free to respond 


in whatever terms and frames of reference they like. The following examples illustrate the 
open-ended questions: 


|. Why did the Janata Party suffer a defeat in the 1980 mid-term Lok Sabha poll? 
2. What do you think about the prospects of the Congress (1)? 


In the open-ended questions, the interviewer usually adopts nondirect means for getting full 
answers and co-operation from persons. He encourages an individual to talk freely and, if 
necessary, uses some mild probes like “Would you like to say more?” or “Why?” or “How?”, etc. 
Both the fixed-alternative and the open-ended questions have been frequently used in assessing 
attitudes. Since fixed-alternative questions have certain advantages lik 


e simplicity and scoring 
economy, they usually yield more reliable data than the open-ended questions. But this does not 
mean that the open-ended questions should be abandoned in favour of the fixed-alternative 


questions. In fact, both types of questions can be used together—the open-ended questions for 
exploring general problems and contexts of the object of attitude, and the fixed-alternative 
questions for exploring specific problems and contexts of the object of attitude. 

2. The biographical and essay methods: In this method attitu 
expressed in letters, autobiographies, diaries, written or oral interviews, etc. In written interviews, 
sometimes the persons are asked to write short €ssays Concerning the objects of attitude. 
Opinions, revealed through all these sources, help the investigator in assessing the attitude of the 
persons. One general demerit of these methods is that While writing letters or an autobiography 
or biography, the persons sometimes do not express the true content of a humiliating event 
because they know that such materials may further expose them to humiliating situations. 


de is assessed from the opinions 


Disguised Nonstructured Tests 


Under this heading the typical projective tests, originally 
variables like want, need, anxiety, etc., are em 


the Rorschach test, TAT tests, the Word-associa 


developed to measure personality 
ployed. A few examples of the projective tests are 
tion test, the World test, etc. (A detailed discussion 


Measurement of Personality 217 


of projective tests is presented in Chapter 11.) One of the basic assumptions underlying these 

rojective tests is that when the individual makes a specific interpretation about the situation 
resented through these techniques, the interpretation is not determined by something in the 

‘cture itself but by something about himself (his motives, attitudes, values, etc.). Proshansky 
(1942) devised one such projective test for studying attitudes towards labour. From various 
sources, he collected some pictures which obviously had something to do with working people 
‘n conflicting situations. Subjects were shown these pictures with the instruction to describe them 
in as much detail as possible. They were also asked to give their details in the form of a story. 
subsequently, three qualified judges rated each story as to the degree of favourableness or 
unfavourableness of attitude towards labour. Proshanky found a very high agreement between 
the attitude evaluated by the judges and the picture responses. He, however, found two serious 
limitations in projective-type tests. First, the descriptions of the pictures given by the individuals 
are sometimes so vague and ambiguous that scoring becomes difficult. Second, judges are 
‘nfluenced by their personal biases in ratings. These two limitations make the projective-type 
tests of attitude unreliable and invalid. 


Disguised Structured Tests 


Under this heading again, the indirect methods of assessing attitudes are included. These indirect 
measures are different from the projective tests and very closely resemble the objective testing of 
attitudes as it is found in the self-report inventories. One example of such a disguised-structured 
test is the error-choice technique developed by Hammond (1948) to measure attitudes towards 
labour management and towards Russia. The technique requires a number of nonfactual items 
(or error-choice items), each of which allows two alternatives, out of which one that seems to be 
the correct one, is to be selected by the subject. In a test of attitudes towards labour, one of the 
items was like this: 
The average weekly wage of the war worker in 1945 was...... . (a) $37, (b) $57. 

In reality both the above averages were wrong but as the subject was forced to choose one, 
and choosing from the alternatives depended largely upon the subject’s attitude. Likewise, in a 
test of attitude towards Russia, one of the items was as under: 

Russia’s removal of heavy industry from Austria was...... . (a) legal, (b) illegal. 

In the above item or similar other items, facts were really indeterminable and hence a 
subject’s response of choosing either “legal” or “illegal” revealed his own attitude towards 
Russia. Hammond prepared a large number of nonfactual items (or error-choice items) and 
subsequently, presented them to the subjects. The labour test and the Russia test were 
administered under the guise of an ‘information test’, that is, the subjects were told that the test 
was merely intended to gather information and therefore, their co-operation was needed. The 
systematic errors in checking the alternatives on error-choice items in one or the other direction 
were taken as a measure of attitude towards the object. Hammond had recommended that for 
effective use of the error-choice techniques, the number of alternatives might be increased from 
two to three or five. Kubany (1953) used the error-choice technique for measuring attitudes 
towards national health insurance and found that the indirect technique made more valid 
distinctions among persons holding favourable or unfavourable attitudes towards national health 
insurance, 


It is obvious from the above discussion that the first two measures (nondisguised structured 
tests and nondisguised nonstructured tests) are direct measures of attitude, whereas the last two 
Measures (disguised nonstructured tests and disguised structured tests) are indirect measures of 
attitudes. The indirect measures of attitudes have two principal advantages over the direct 
Measures, First, indirect measures enable the investigator to measure the attitude of the individual 
without his being aware. It has been observed that when individuals are aware that their attitudes 


218 Tests, Measurements and Research Methods in Behavioural Sciences 


are being assessed (as is the case with direct measures of attitude), this feel ing in itself produces a 
change in the attitude being measured. Thus, indirect measurements enable the investigator to 
measure an attitude without any effect upon the attitude itself. Second, for assessing attitudes 
towards an object or event, which is considered more or less confidential or private and which a 
Person may not want to reveal in public, indirect measures are more reliable and valid. For 
example, when the investigator is studying the attitude of unmarried mothers towards sex 
indirect measures would yield more valid data than direct measures because such mothers 


violate social norms and are considered to be stigmas in Indian society. 
a en EAR 


i] Review Questions 


1. Discuss the research importance of self-report inventory and situational tests as tools of 
personality assessment. 
2. Write a review of some of the important self-report inventories developed in Indian 


conditions. 
3. Throw light upon the importance of interest tests and value tests in psychological 


researches. 

4. Discuss the research utilities of MMPI and Bell Adjustment Inventory in psychological 
researches. 

5. Which measures of attitude do you consider best? Give reasons. 


a 


11 
PROJECTIVE TECHNIQUES 


_ CHAPTER PREVIEW 
e Meaning and Types of Projective Techniques 


e Classification of Projective Techniques 
e Pictorial Techniques 
e The Rorschach Test 
e Interpretation of the Rorschach Protocol 
e The Holtzman Inkblot Test 
e Thematic Apperception Test, or TAT 
Derivatives of TAT 
e Verbal Techniques 
Word-Association Test 
sentence-Completion Test 
e Expressive Techniques 
Figure-Drawing Tests 
Toy Tests 
Artistic Productions 
Graphology 
¢ Evaluation of Projective Techniques 
Fakability 
Objectivity 
Standardization 
Reliability 
Validity 
Situational Variables 
General Applicability 


MEANING AND TYPES OF PROJECTIVE TECHNIQUES 


Projective techniques which originated in a clinical ‘setung ar wh meme 
Personality, The history of projective assessment goes back to the — . sil paihaas 
'S Said to have selected pupils on the basis of ad ie r a te eat rojective 
ambiguous form (Piotrowski 1972). In modern a Salter, i . val by Kent sel Biesennall 
technique, a word association test, in 1879. This procedure _ Td acne nrojective 
into testing in 1910 and it was frequently used by me ae taal iced intredhuaced thre (eth 
techniques gradually evolved into projective tests. In fact, : y with unstructured 


fei ing personalit 
Projective method for describing a a neces oh pir to which he responds. 
Stimulj. | jective tests, the individual IS BIVE" ¢ | interpretation vary from 
B rts tegen an a situation Whose meaning and interpre if “ 
Y an unstructured situation, we mea no right or wrong answers and are capable 


Individual to individual. Such situations have 
219 


sy 


220 Tests, Measurements and Research Methods in Bebavioural Science 


evoking fantasy material from the testees (Lindzey 1961). The ie ag eal ee of 
projective techniques is that while responding to an unstructured —, eon 4" Projects 
his own feelings, needs, emotions, motives, etc., (which are Serie aten = fk ae 
without being aware of doing so. Since the individual is not aware of tr va ee al ime re doesn 
resort to any defensive reactions. Thus, in a projective test, the individua ‘ as amp © opportunity 
to project his own personality attributes that are mostly vn ol an MURCORSCIOUS “TH thie 
interpretation of an unstructured situation. Such latent and concealed experiences are generally 
incapable of exposure by the questionnaire type of test. | | 
Projective tests differ from the self-report inventories and observational method of assessing 
personality. Self-report inventories provide a direct and structured situation al eed cle the traits 
of personality, whereas the projective tests provide an indirect and eens situation for 
assessing these traits. In self-report inventories attention is focused upon the measurement of 
traits, that is, upon the measurement of personality traits nd ividually, whereas projective tests are 
characterized by the global approach to the assessment of personality, that is, in the projective 
tests attention is focused upon the measurement of personality as a whole. Projective tests are 
also different from the observational methods, which are primarily dependent upon the fact that 
the observer must know the person being observed in detail and he must report the observation 
without any bias, Projective tests require no such observer and the individual, to a greater extent, 
gives an objective report of his observation of an unstructured situation. 


CLASSIFICATION OF PROJECTIVE TECHNIQUES 


There are different types of projective techniques, and different psychologists have 
Classified them into different categories. The earliest classification of projective techniques was 
done by Frank (1939). His classification is mainly based upon the nature of response evoked by 
the materials of projective techniques, 

On this basis, he classified projective tech niques into five categories as given below. 
Constitutive 


This category includes all those test situations in which the examinee constitutes or frames 
structures upon materials, which are yet unstructured. Finger-painting and drawing-completion 


are its best examples. According to Zubin, Eron and Schumer (1965) the Rorschach test also falls 
in this category in the sense that here the examinee imposes hi 


ace © OWN structure upon unstructured 
inkblot situations. 


Constructive 


The constructive category, though apparently similar to the 
those test situations where the examinee is required to constry 
impose a degree of structure upon the situation in the direct 
examinee may be asked to draw a figure of a human male or 
in the constructive category rather than in the constitutive ca 
the examinee’s own wish or desire is done. Thus, th 
free expression of the examiner’s inclination whe 
not give such permission. 


constitutive category, includes all 
Cla specified task. He is required to 
lon specified by the examiner, The 
female and this would be included 
Category where drawing according to 
€ constitutive Category test situations allow a 
reas the constructive Category test situations do 


Interpretative 


This category includes all those test situations where 
comprehensive meaning to the situation. The Thematic A 
association test are included under this category. 


Refractive 
Under this category are included all those techniques through which the examinee 
Opportunity to express his personality in the form of painting, 
Graphology (or handwriting) has been cited by Frank as the best exa 


the examinee js required to add a 
PPerception Test (or TAT) and the word 


is given an 
drawing, handwriting, etc. 
mple of this category. 


Projective Techniques 221 


Cathartic 


This category includes those situations whereby the a | 
some manipulative tasks for the release rpg el coal a qo eh Opportunity through 
best example. , Wish, ete. Play techniques are cited as its 
In fair evaluation of Frank’s classification, it can be said that his classification solves less and 
creates more problems. One of the biggest limitations is that his classification overlaps so much 
so that the same test can be included under more than one or tworcategories For mee Ja a 
figure-drawing test may be included in either the constitutive or constructive oF i 
category: Frank's classification, therefore, is not a widely accepted. classification of the 
projective test. a 
The more convincing classification of projective techniques has been recently provided by 
Lindzey (1959). Based upon the responses of the examinees, he has divided projective 
techniques into the following five categories: 


Association Techniques 

This category includes all those situations where the examinee is required to respond with the 
associations which are evoked in his mind after seeing or listening to stimulus materials. The 
Rorschach test, the Holtzman Inkblot test and the word-association test are its best examples. The 
Rorschach test requires the examinee to respond to an unstructured situation of inkblots in the 
form of verbal associations with objects, events, persons, etc. No attempt is made to mould those 
associations either by the examinee or by the examiner. Similarly, in the word-association test the 
examinee is presented with a variety of words one after another with the instruction to respond 
with the very first word that comes to his mind after listening to the stimulus word. Subsequently, 
the responses and the reaction time (the time elapsing between the presentation of the stimulus 
word and the response word) are analyzed for studying the personality. 


Construction Techniques 

This category includes all those situations where the examinee is required to construct a story 
after seeing the stimulus materials (usually a picture) within a certain specified time. No record is 
generally kept of time but the examinee’s themes and mode of responding are considered 
relevant. The Thematic Apperception Test (or the TAT), the Children’s Apperception Test (or the 
CAT), the Blacky Pictures, the Object Relations Technique and the Pickford Projective Picture 
(PPP) are some of the best examples of construction techniques. In all these tests, the examinee is 
required to construct or produce simple or complex statements in the form of a story. A discussion 
of all these tests is beyond the scope of this book. However, some of them appear In detail in later 


sections of this chapter. 


Completion Techniques 

These techniques include those situations 
incomplete sentences with the instruction t | 
examples which illustrate the techniques are given below: 


where the examinee 1s presented with some 
o complete them in any way he desires. A few 


| feel tense ...-- 7 
My ambition in life Sescve- 


| often get Nervous ..---- - _ 
reted and analyzed to find some clue regarding his 


k a uniform and standard mode of analysis. Stories 
The Madeleine Thomas Completion Stories test 
test. are examples. The Rosenzweig 
ample of completion technique. 


Responses given by the examinee are interp 
personality. These techniques, howevel, lac 
have also been used as completion techniques. 1 
(Mills 1953) and Rotter’s Sentence Completion 
Picture-Frustration Study (Rosenzwelg 1949) is another ex 





222 Tests Measurements and Kesearch Methods in Behavioural Sciences 


Expressive Techniques 


This technique includes those situations where the examinee enpreuses his persona 
some manipulative tasks, which usually involve some interaction with given mat 
drawing, role-playing, painting, finger-painting, etc., are the common expressive 

One important feature of expressive techniques is that the examiner Pays much attent 
Way OF process by which the examinee manipulates the given materials. For examples, he May 
ask the examinee to play with a given set of dolls; he may pay attention to the Process by which 
the dolls are selected and handled during the play. Thus in the expressive techniques, attention jg 
given to the process and not to the end product of the process. Expressive techniques, in this Way, 
are different from construction techniques because here much emphasis is given to the Process o, 
Way of handling the test materials rather than upon the end product of the process (such as the 


content or theme of the stories, etc.). The famous Toy-World Test is an example of Expressive 
technique, 


lity throy 
€rials. Pla 
techni les 


ion to the 


Choice Techniques 


Choice techniques (also known as Ordering techniques) are not projective techniques in 
sense of the term; rather they may be regarded as a step towards objectifying 
techniques (Kerlinger 1973), Usually, the examinee is presented with 

items (which convey the different degrees of a trait) with the instruct 
relevant and 4ppropriate picture. Sometimes, he may be asked to order 
terms of his preferences and hence 
becomes the basis for the inference 
ordering technique because the e 
like-dislike dimension. 


the true 
the projective 
some sets of pictures or 
ion to choose the most 
or rank those pictures in 
the name ordering techniques. The choice of the subject 
regarding his personality. The Szondi test is an example of 
xaminees are required to rank sets of pictures along a 


Still another important classification of the 


projective techniques has been done by Best 
(1978). He has Proposed a four-way Classification 


of projective techniques as stated below: 
Association 


oons, etc., and is asked to tel| what he sees, 


sociation test 
belong to this category. 


Completion 


This category incorporates the situation where some incom 
examinee who completes them in the way he likes. The sent 
category. Such tests have been found to be most useful in 
guilt-feelings, hostility, aggression, attitudes toward sexuali 
illustrating paranoid tendencies may be framed like this: 

somebody is always trying ww... 

| worry about ...... 

| think that my body ...... . 


7 | est belongs to this 
interpreting traits like anxieties, 
ty, Paranoid tendencies, etc. Items 


oe : cific role j 

This technique requires the examinee to act out a spe : ee ina TOUP of two ar mor 

individuals for a certain period of time. It has been snoemereey re served that when an examinee is 

asked to play a certain role, he says surprising 3) ee i F are Nave said otherwise. 

His sayings reveal the underlying motives, wishes, rome sent “tc. The role-playing 
" ay has been found to be most suitable for traits like hostility, sympathy, 

technique 


'UStration, 
dominance and authoritarianism. 


Projective Techniques 223 


Creative Or Constructive 
This technique includes all those test situations where the examinee is required to do some tasks 
like playing with dolls or toys, paint or finger-paint and draw a figure, etc. Best also says that 
<ometimes the examinee may be asked to write imaginative stories about some assigned 
<ituations. Hence, TAT is also included under this head. Generally, on the basis of the choice of 
colours, words, forms, order of the sentences, etc., an inference regarding the traits of the 
personality of the examinee is drawn. 
A comparative study of Lindzey’s classification and Best's classification makes it clear that 
she former is more appropriate than the latter. The role-playing and the constructive techniques of 
Best are covered under one category, that is, the expressive techniques of Lindzey. Besides this, 
Lindzey’s classification is very broad and fact-covering. Almost all the projective techniques are 
covered under any one of the categories without much overlapping. 

One way to reduce the confusion and overlapping in classification of projective techniques 
's to classify them into three broad categories: the pictorial techniques, the verbal techniques and 
the expressive techniques. 
Pictorial 
These techniques include all those situations where the unstructured situation consists of vague 
and ambiguous pictures and the examinee is to respond towards those pictures. His response 
may be in terms of a few words as it is done in the Rorschach test and Holtzman Inkblot test, or in 
terms of a series of sentences as it is done in the TAT, the CAT, the Rosenzweig Picture Frustration 


test, etc. 


Verbal 

In this technique, the stimulus materials are presented in verbal form such as in terms of words or 
incomplete sentences. The examinee may be required to respond to those words with the first 
word that comes to his mind and in case of incomplete sentences, he is required to complete 
them in a way he likes. Thus the verbal techniques are different from the pictorial techniques in 
the sense that here the stimulus materials are verbal and not pictorial. The word-association test 
and the sentence-completion test are its best examples. 


Expressive 

This technique incorporates all those situations whereby the examinee is given an opportunity to 
express his personality mostly through some manipulation and objective tasks. Graphology, 
painting, drawing, playing with dolls and role-playing are some of the common expressive 
techniques. The way or the manner in which the examinee does his task in the above-mentioned 
situations becomes the basis for appraisal of his personality traits. 

Thus, we see that there are different types of projective tests. These varied projective tests 

have the following characteristics in common: 

(i) All projective tests share one common assumption called projective hypothesis, which 
States that a person will project something important about himself or herself on to a 
vague or ambiguous stimulus usually called unstructured stimuli. 

(ii) The procedures used in the projective tests are usually disguised to some degree. In other 
words, the testees do not know the way their responses will be analyzed and interpreted. 

iii) Projective tests are relatively unstructured and therefore, the task incorporated generate 


inexhaustible variety of possible responses. 


224 = «Tests, Measurements and 


Research Methods if? Behav Pal SCIETI« es 


tive tests, the instructions given to the testee or examinee 











(iv) In projec cE ep pee sia | are pe, 
: Fil ener themselves. Instructions like ‘Tell me what you see’ fie Thea 
ef 2 - ~) a ‘Ere | 
, ong answer , OF wwhat might this be?’ are common. eis 
wro!l | 
es ) , , a global approach for assess) 
(v) Projective tests usually —_ 4 | lit an “SSING the pena 
| : s, the psyCnolog!s » topemer a com : sOn's 
personality. In other word: , ps' aecheetcal functioning rath PTeNensive a Whe 
, ‘ Ss SY P : ~ L AE | a 1 he =" 
of the testee’s or the examinee $ P>' er than ONY identien 
particular trait. 
(vi) In clinical situation, the projective tests are used in an idiogr 
| projective tests in such a situation are used to develop a model of 
of an individual. 
(vii) Supporters of the projective tests state that these techniques are only ettective mas, 
| ' > ff 1 ality ors +} : ne +. 
evaluating unconscious elements OF personality, hr re Basic assumption is tha 
o . ne ccte that c mMnot LL, - — We 
pel such information through projective fests hat Cannot O¢ vathered otherwise ; 
(viii) The interpretation of responses OF Proje tive fests hom been strongly influenced by 
psychoanalytic thinking. The concept of psyc! Geterminism and the impos 
placed upon the various UNCONSCIOUS Processes Par’ i the way for projective mele 
Thus despite its varied types, the projective tests have some Gegree of commonality, 


PICTORIAL TECHNIQUES 


As mentioned earlier, pictorial techniques are those techniques e Tye unstructured situation 
consists of vague and ambiguous pictures to which the exam | . fO give a response, The 
responses May he in terms of words or a sernes of sentencs The Rorschach test, the Thematk 
Apperception test and The Holtzman Inkblot test are the best examples of pict rial techniques, A 


detailed discussion of these are piven below 


THE RORSCHACH TEST 


The most popular projective  techniqui > the 
Rorschach Inkblot test developed by Hermann 
Rorschach, a Swiss psye hiatrist in 1921 to make a 
diagnostic iInvestipalion Of personality as a WhOK Al 
the outset, tt must be borne in mind that the the 
Rorschach test isa measure of by th the intellectual ana 
nonintellectual traits of personality. Rorschact 
investigated with a large number of inkblor it of 
which only 10 inkblots that differentiated mos 


between Various sy hiatric syndromes wer 
to constitute a test. Thus the Rors hach test 
of ten cards, each of which contained a 
symmetrical printed inkblot. Five inkblot cards (¢ 


Selected 
i msisted 


Dilaterally 





_ardas 
I, IV, V, VI, and VIN) are made in Shades ot black and 
gray; two cards (Cards Il and III) « ontain bright patches -  rakblot wpe 
of red in addition to the shades of} black and gray and Fig. 11.1 Rom hach ink 
the remaining three cards (Cards VIN, IX and X) contain Figule 1 
several pastel shades. An inkblot like that used in the Rorschach test is ‘jlustrated in Fi 


Major life events of He 


— | | 
rmann Rorschach have been given on the next pape: 


7 





¥ 
A 
a 





Projective Techniques 225 


Major life events of H Rorschach 


|}. H Rorschach was born on 4 November 1884 at Zurich, Switzerland. 
Rorschach visited Russia in 1906 and started loving Russia and Russians. 
In 1910, Rorschach married a Russian girl named Olga Stempeliu. 


& Ww bo 


. Atthe age of 10, that is, in 1894 Rorschach was very much fascinated with Alfred Binet’s 
idea of using inkblots for assessing the functioning of person ality. 
5. Rorschach started working on inkblots in 1911. 


6. In 1912 Rorschach received an MD psychiatry under the supervision of Eugen Bleuler 
who was the teacher of CG Jung and had coined the term schizophrenia, 
7. The title of PhD thesis of Rorsch 


ach was ‘on Reflex Hallucinations and Kindred 
Manifestations’. 


. Based on his doctoral research, Rorschach published a research paper entitled ‘Reflex 
Hallucination and Symbolism with Psychoanalytic Framework’ in 1912. 
9. In between 1912-14, Rorschach 


published many research Papers on various clinical 
issues, 


10. Rorschach also encouraged Pet 
mental hospital for studying his 
the monkey. 


therapy where he used to bring his pet monkey to the 
patient’s reactions while they were being entertained by 


11. Rorschach’s most important work on inkblot test. Psychodiagnostik was published 
in 1921. 


12. On 2 April 1922, Rorschach died of appendicitis at the age of 37. 


The entire 


procedure of the Rorschach test may be presented under the 
headings: admi 


nistration, scoring and interpretation. 
Administration 


The administration of the 
which is known as the 
the third stage known 
his seat and then the - 


following three 


Rorschach test is conveniently 
performance proper, the second sta 
as the testing-of-the-limits. 
xaminer talks with him fora 


divided into three stages—first stage 
ge which is known as the inquiry and 
In the first stage, the examinee is asked to take 


few minutes to establish rapport. Handing over 
the first card to the examinee with the top side up, the Rorschach’s basic Instruction “What might 


this be?” is given. Klopfer & Kelley (1942, 32) modified and extended the basic instruction of 
Rorschach as under: “People see all sorts of things in these inkblots; now tell me what you see, 


what it might be for you, what it makes you think of.” The examiner carefully notes down the 
following events: 


1. He notes down the reaction time, that is, the time which elapses between the 
Presentation of the card and the examinee’s first scorable score. The reaction time is symbolized 
as t(Semeonoff 1 976). Exclamations and comments are not counted as scores. However, they are 
Noted down by the examiner, 


2. He systematically notes down the position of the card w 
sie = is kept upright, the symbol used for this 
kee - | in becomes V: when the top is kept at the left side, the symbol used is <> when the top is 
Pini € right side, the symbol becomes >; and when the examinee rotates the card without 
Pping, the symbol used is (0). Some experts use no symbol at all when the top Is kept upright. 
3. the responses are recorded verbatim. There are two basic purposes for recording the 
’ONSEs in this Way. First, it enables the examiner to read it Clearly during the time of scoring 


| | | bo . = a “ ape 
that th US, It INCreases the precision of scoring. Second, it facilitates others to read the record so 
=’ May know exactly what the examinee had said. 


hen the response is being given. 


is A; when the top is turned downward, 





‘= 


nts and research A fethods in Behavioural Sciences 
& Tests, Measuremenr é 
226 0 Jets, 


total time for which the subject 


examiner records the C€ps each ‘ 


4. | he - A act : , . | : : ‘ 
symbolized by 1 (Semeonol! 1976). SINC e the i hag _ »€E AS Many things .. This i 
n . riod is known as the free association period ana the total time elapsed durin *N ca, 

1s a 7 2 
sesociation period of each card Is noted. B the fie 


When the examinee returns the first card, he is given next card and 50 on fo, “ 
cards. Thus, all the 10 cards are presented in ss ae mt or J The examiner shoula + the ” 
cards out of the reach of the subject and he must hand over aac) card to the examinee ra llth 
place it before him. It is common during testing for the examinee tO raise some qliestiog 
administering this test, silence by the examiner is considered as very important : 6, |p 
Rorschachers. However, this can be interrupted when the examinee raises the ducati 
answering these questions, the examiner must adopt a nondirect approach. A few mal 
illustrate the point: Dies 

Examinee: “How long should | keep the card?” 

Examiner: “As long as you like”. 

Examinee: “Should | turn the card?” 

Examiner: “It depends upon you.” or “As you like.” 

Examinee: “Should | report more than one thing?” 

Examiner: “Most people usually report more than one thing.” 


Thus the questions raised by the examinee are answered in a nondirect way which, intum, 
provides encouragement to him. Sometimes, it is found that the examinee rejects the cards 
altogether without giving any response. If such rejection occurs on the first or the second card 
the examiner should rethink his procedure or his choice of the Rorschach test as the appropriate 
test. However, if such rejection occurs on the eighth or the ninth card, the examiner should 
encourage the examinee by saying, “There is no hurry; try and see something in it.” The examiner 
should see that sufficient time intervenes before the examinee rejects the card. Beck (1944) and 
Hertz (1969) suggest a minimum time of 2 minutes only after which rejection can be justified 
Rorschachers differ with regard to the maximum time a card should be kept by the examinee 
peck (1945) suggests that the examinee can keep a card as long as 10 minutes whereas 
Motrowski (1969) suggests five minutes as a reasonable maximum time for keeping @ cat 
sometimes, examinees (in case of extremely compulsive examinees) have been found t0 §° on 
responding endlessly to a card. In such a situation, the examiner should interrupt. ne a 
—e - responses or K on a Rorschach card for most adults varies from 15 to 30. A ant 
a ans A sag indicate psychopathological traits but aon 
examiner should note it and it ned ee gives a vague and uncertain : on, ail 
immediately follows the first stave ‘h me clarified in the second stage of adminis 

iste lls 6¢, that is, the ‘performance proper’. 

© second stage of administration of the Rorschach test. It usually 1 


responses to all the ten cards have b : ia prope 
, | ; i . mane 
Rapaport, Gill & Schafer (1 946) €en obtained during the stage of the perfor 


immediately after responses to anne et the Spinion Mat imquiry pace out of $8 : 
trom the examinees Rapa 2 card have been obtained and with the card place associa 

Or his thinking riaiibe ie Ont takes the plea that the examinee is apt to forget many “a sent! 
that inquiry should be —— PY presentation of subsequent cards. It ts, ee Mo" 
Korschachers, however —- immediately after the presentation 0! * -ondut ‘ 
inquiry after the res _ oie ee Rapaport’s viewpoint and continue ' 2 al 
inquiry is to wis all lems ten cards have been obtained. The bas! pony 
Therefore, the inguin... ation neces coring the resPO™*” 4 ses . 
the “A Rg, = se asking the innailions on es each of! A eis! 

en NE location sheet so that it may i -_ at part ast the blot is ™ be 


the 


Le 


A 


Projective Techniques 227 


response and why the particular part gives rise to the response. The purpose of keeping the 
location chart before the examinee is to make a permanent record of the area of the blot used by 
him in responding. Thus, the purpose of the inquiry is twofold. First, it allows the examiner to 
obtain further information regarding the examinee’s responses and second, it also helps the 
examinee to clarify his responses by adding and expressing more about what had already been 
said. Ordinarily, the extent to which questions during the stage of inquiry are formulated are left 
entirely to the skill of the examiner. However, for success of the inquiry, the examiner should 
keep some basic principles in view. First, the examiner must frame appropriate questions when 
needed. Some responses are very clear-cut and need no questions by the examiner. Questions 
put unnecessarily during the inquiry would add nothing new. Therefore, questions should be 
asked only when the examiner is in doubt regarding its location or meaning. Second, the inquiry 
questions should be framed in a nondirect way. (Examples of nondirect questions have been 
given earlier.) Third, the inquiry questions should not be long. They should be precise and brief 
and must be formulated to evoke answers which are helpful in scoring. 


Testing-of-the-limit is the third and final stage in the Rorschach administration. However, 
this stage is not needed for all examinees. In some instances, it is seen that examinees give 
responses which are uncommon and not ordinarily found in most of the Rorschach protocols. For 
example, the examinee might somehow be convinced that whole responses are better than part 
responses and therefore give only whole responses to all the ten cards. In such a situation, 
testing-of-the-limits is done through adequate encouragement by the examiner in order to see 
whether the examinee is capable of changing the response in order to evoke the responses found 
in most of the protocols. The stage is so-called because the procedure intends to test the limits of 
giving ordered responses on the Rorschach test. 


Scoring 
In the Rorschach test, ‘scoring’ refers to the classification of responses into different categories so 
that the product may reveal the personality of the examinee as a whole. Regrettably, Rorschach 
died before he could have conceptualized his scoring system. Therefore, it was left to his 
followers to complete the systematization of Rorschach scoring. Five American psychologists, 
namely, S Beck, M Hertz, B Klopfer, Z Piotrowski and D Rapaport tried their best and produced 
overlapping but independent approach to the test. It was found that nuances of scoring vary from 
one scoring method to another. Therefore, Exner and his colleagues have developed the 
Comprehensive Scoring System (CSS) by synthesizing their earlier approaches (Exner 1993, Exner 
& Winer 1995). The Css agrees that from the point of view of scoring, there are four main 
categories of classification of responses. 

1. Location 

2. Determinants 

3. Content 


4. Popular Responses and Original Responses 
A detailed discussion of the four categories is presented below. 


Location 


Location is the first and easiest system in Rorschach interpretation. Location refers to the part of 
the blot which produces a particular response. Whether the whole blot is producing a response 
°F only a part of the whole blot is producing the response is indicated by this category of scoring. 








228 Tests, Measurements and Research Methods in Behavioural Sciences 
racy S, ib a Sh = 


In the former case, it isa whole response and the scoring is direct and simple byt inthe 

the scoring is further dependent upon the fact whether the used Partisa commonly 5 - Brag 
one. A detailed discussion of the scoring based upon location of the responses ‘oe ely Us 
symbols is presented below. 8 With . 


The W response (whole responses): When the whole blot instigates the response 
as W. Thus the criterion for the scoring of W is a sort of either—or Phenomenon : 
examinee either uses the whole blot or does not. In general, a W response jc cons; 
response because it indicates an overall view of the entire Situation. There are two 
responses: the cut-off whole symbolized as W and the contabulatory whole sym 
or Ddw. The cut-off whole response, which was suggested by Klopfer & Kelley ( 
which the examinee gives the response on the basis of the whole blot excludin 
For example, Card | may be perceived as ‘Bat’ excluding its lower Projection or 
Cards I, Il and Ill evoke frequently the cut-off whole response, Card || 
‘Buffalo’ excluding the red colour whereas Card III may be perceived as ‘Two women’ excludin 
the red colour. Beck (1944) has, however, objected to the cut-off whole response and js of View 
that a response is either W or it is not. He argued that since there are no evidences 
reliability of the cut-off whole response and also because the literature does not make m 
any such response, it is difficult to accept it. The confabulatory whole or DW or Ddw re 
one where “the examinee attends only to a detailed area of the blot, but then generalizes from 
that detail to the entire blot” (Exner 1974, 57). When the examinee attends to the usual detail, Dw 
is used and when the examinee attends to the unusual detail, Ddw is used. Such responses 
obviously indicate that the examinee has not distinguished the various areas of the blot to havea 
clear perception but instead, he has generalized from a single minor detail to the whole blot. The 


DW or Ddw response, therefore, indicates perceptual cognitive impairment. However, such 
responses on the Rorschach test are very unusual. 


” Which ty 
dered Fen 3 
Subtypes 05 \y 
bolized a5 Dy 
1942) j. One jr 
5 Minor detail, 
side PrOjectigg, 
May be Perceived >. 


Of scorer 
Ention of 
SPONse js 


The D response (common details) 
‘usual’ or ‘normal’ (sometimes called 
principle for a particular response to b 
seen by most examinees as separate 


: The D response stands for a response based upon the 
‘large usual’) detail or part of the blot. The fundamenté 
€ rated as Dis that a detail should be easily and frequently 
from the remainder of the blot. It should not be confuse! 
with a ‘popular response’ of the content Category (to be discussed later on) because the latter 
different from the former, Rapaport, Gill & Schafer (1946) have defined the D resp 
belonging to the area “which is conspicuous by its size, its location and the frequency of i 
response it draws.” Thus the ‘normal’ or ‘usual’ details of the blot are the areas which “ 
stand out by virtue of their Position and contour. A low percentage of D response !5 pe 
indicative of maladjustment. Klopfer et al. (1 954) have divided the usual or common details! 


4| are. 
usual 4 
tal D for large usual area and small d for small i 
sma 


nusual detail responses (also called |ways 


usually the smallest details of the picture al 56), 2 
response is “one gj Rorschach 1942). In the words of Exner (1 sins of the ble 
Pons ne glven to a blot area Which is not used frequently.” Thus when that p4 cally. * 


eG a ie i 
a bs my used by most €xaminees instigates a response, it is scored as Da. 18 iv 

ponse can be scored as Dd when it is neither W nor D. Klopfer and Kelley (194 2)ha are* 
the Dd response into | meP 1 


elec, ae four Subcategories and the symbols for each are different. is . scored* 
ae Ponse, it is scored as dd; when an edge or contour instigates a respons’, it S st 
ae; w ide , | etal 

en Inside areas evoke 4 response, it is scored as di; and when an unusual det 


responses by Rorschach) are based upon “ 
overlooked by normal Subjects” ( 





Projective Techniques 229 


does not fall into any of these categories, it is scored for rare detail as dr. A dr response is highly 
idiosyncratic and rare, and sometimes the examiner fails to see in the way the examinee 
perceives. Exner (1974) has, however, objected to Klopfer’s division of the unusual details on 
three grounds, First, the examiners tend to disagree with specific types of Dd. For example, a 
response can both be tiny and rare and therefore, can be scored as dd as well as dr. Likewise, 
some inside detail responses are not uncommon whereas others are extremely rare and 
uncommon. The former may be scored as di and the latter may be scored as dr or both types of 
response may simply be scored as di. Thus, there is room for the scorer to vary and this 
unnecessarily lowers the reliability of the four categories of Klopfer’s division. Second, there are 
no other evidences for such a fine division of the Dd response. The limited research base does not 
make such a differentiation regarding the type of Dd. Third, the frequency of any of these 
categories of Dd response is very low and contributes very little to the scoring summary. Keeping 
in view what has been said above, Exner (1974) recommends that all types of Dd responses 


should simply be scored as Dd. This would decrease the confusion regarding the differentiations 
among scorers and therefore, would tend to increase the scorer reliability. 


The S response (white space details): When the white space area is used as the basis of 
response, it is scored as S. The S response is never scored alone. According to Exner (1974, 57), it 
is “always used in conjunction with one of the three primary location scores such as 
W,,D, or Ddx.” Rorschach himself regarded S as one form of unusual detail, that is, Dd. Beck 
has also expressed a similar view and said that S responses are a subclass of D or Dd so that his 
symbols become DS or Dds. The chief rationale for using S in conjunction with other location 
scores is to maintain consistency in the evaluation of the three main location scores—W, D and 
Dd. Some Rorschachers have, however, attempted to modify Rorschach’s original idea. They 
have suggested that the space response may be scored separately as S if the basis of response is 
only the white space area. But when space area is used in combination with other parts of the 
blot, it must be scored as WS, DS or Dds depending upon the nature of the response. When the 
white space area is scored separately, it is included in W, D or Dd. Hertz (1970) is one such 
Rorschacher who recommends separate scoring for white responses. Accordingly, she has 
distinguished between two types of white space responses—the common and frequently used 


white space detail scored as capital S$ and the rare and infrequently used white space detail 
scored as small s. 


The DdD response (confabulated details): The confabulated detail response symbolized as 
DdD is scored in much the same way as DW and Ddw. According to Exner (1974, 60), a DaD 
response is “one in which the subject interprets an unusual detail of the blot and then generalizes 
trom that interpretation to a common detail area.” The basic principle for a response to be scored 
as DdD is that a common or usual detail area is interpreted secondarily and the primary emphasis 
's based upon an unusual detail area. As compared to DW and Daw, the DdD response occurs 
much less frequently in a normal Rorschach protocol. 


These are the basic location scores on the Rorschach test. Out of these, the first three 
location scores, namely, W, D and Dd are the primary ones: According to Nunnally (1970, 30) 
aN average adult gives 6W, 20D and 4Dd responses. 


A summary showing the symbols, their descriptions and the criteria of the Rorschach 
location scores is given in Table 11.1. 








230 Tests, Measurements and Research Methods in Behavioural Sciences 


Table 11.1 Symbols, descriptions and criteria of Rorschach location scores 


Symbol Description Criterion 


When all portions of the blot are used in giving a 
response 


W Whole response 


DW/Ddw_ Confabulated whole When secondary emphasis is given to the 
interpretation of the blot as a whole and the primary 


response 
emphasis is on a detail portion, either usual or 
unusual of the blot 
D Usual or common When response is given on the basis of trequently or 


detail response commonly identified area of the blot 


When response is given on the basis of infrequently 


Dd Unusual or uncommon 
or uncommonly identified area of the blot 


detail response 
When response is given on the basis of white space 
area of the blot (usually scored in combination with 
W, Dand Dd) 


A White space response 


When secondary emphasis is given upon a usual 
detail area and primary emphasis is given upon an 
uncommon or unusual detail area 


DdD Confabulated detail 
response 





Determinants 

Determinants are the second most important phase of scoring in the Rorschach test. This scoring 
system is the most complex of all phases. Determinants refer to the features of the blot which have 
produced the particular response. Originally, Rorschach (1921) suggested five symbols for the 
scoring of determinants: F for form, M for human development, and three for colour 
responses—FC (for form-colour), CF (for colour-form), and C (for pure colour response). It is 
obvious that Rorschach’s original system for scoring the determinants was very simple. His 
original system did not contain any symbol for scoring shading or Chiaroscuro responses because 
Rorschach’s original cards upon which his basic researches were based contained no variations 
in hue or colour. However, the shading features among the cards were created by a printing error 
and immediately after seeing the printed cards, Rorschach thought of the possibility of this new 
dimension (Ellenberger 1974). Accordingly, he added a sixth symbol for a shading ‘chiaroscuro’ 
response, which was posthumously published in his last research paper in 1923. The symbol was 
parenthesized (C). Many Rorschach systematizers who worked after the death of Rorschach have 
conducted a series of comprehensive researches and formulated their own scoring symbols for 
response determinants depending upon the several characteristics of blots. All these 
characteristics descriptively fall into three categories: (a) features relating to form; (b) features 
relating to colour, and (c) features relating to shading. Despite this similarity among the different 

Rorschach systematizers like Mons, Beck, Rapaport, Klopfer, Exner, Hertz, etc., confusion 

persists probably because the symbols used show little agreement among them. Not only this, 

where the same symbols have been used by two or more Rorschachers, the criteria for use of the 

symbols differ to a great extent. The net result is confusion among the scorers. Exner (1974, 70) 

has divided the determinant categories into nine separate parts, for which altogether 24 symbols 

have been introduced. It should be borne in mind that the success of the determinants category 

depends upon proper inquiry. However, spontaneity should be the guideline for giving all 

determinants symbols. If there is reason to believe that the response has been given only after the 





questio' 
descripl 


Form D 


The for 
occurre 
wheren 
found th 
inquiry 

example 
because 
centre O 
have div 
F +and \ 


Moveme 


Three ty 
moveme 
recomme 
his opini 
moveme 
other syn 
disagreec 
inanimat 
‘extensio 
moveme! 
indicates 
have, hoy 
flexion—e 
which is * 
The : 
accept th 
accepted 
should be 
incomplet 
some vari. 
movemen 
examples 
of passive 
a for active 
MP respect 
First, 
shape. 
Secon 
spontanea 
isheldtob 
Vil with “It 
to have bee 
they doing 
talking” th 
spontaneor 


Projective Techniques 231 


question put by the examiner during the inquiry, it should not be scored as such. A detailed 
description of these nine categories of determinants is given below. 


Form Determinant (F) 


The form determinant symbolized by the capital letter F is meant for the response which has 
occurred purely because of the form or shape of the blot. Thus F is reserved for the response 
where nothing other than shape or form has contributed to the percept. Often, Rorschachers have 
found that the percentage of F in many protocols is high. F is relatively easy to score when the 
inquiry (done carefully) reveals that it is only the shape, which has evoked the response. For 
example, Card | may be wholly perceived as ‘Bat’ and on inquiry if it is revealed that this is 
because the two side projections appear as the two wings and the middle part looks like the 
centre of the body, the scorer, without fail, should score it as F. Some Rorschach systematizers 
have divided F into two categories—F + and F —. When the form is clear and good, it is scored as 
F + and when the form is vague, it is scored as F -. 


Movement Determinants (M, FM and m) 


Three types of movement responses are usually reported in the Rorschach test—human 
movement, animal movement and inanimate movement. Originally, Rorschach had 
recommended scoring tor only one type of movement, that is, for human movement, because in 
his opinion animal movement and inanimate movements are in no way different from human 
movement in the basic sense. He, therefore, provided symbol M for such movements and no 
other symbols were provided. But later, Klopfer et al. (1954), Hertz (1970) and Piotrowski (1957) 
disagreed with Rorschach and provided separate symbols for scoring of animal movement and 
inanimate movement. Rorschach characterized M responses as being marked by ‘flexion’ or 
extension’ depending upon the use of the centre axis of the blot area in the response. Thus, a 
movement which indicates ‘pulling inside’ the centre axis is the ‘flexor’, and a movement which 
indicates ‘pulling away’ from the centre axis of the blot area is the ‘extensor’. Beck et al. (1961) 
have, however, added a third type of movement response because in their opinion Rorschach’s 
flexion-extension does not explain all movements. This type of movement, is the movement. 
which is ‘static’. 

The symbol M is, thus, reserved for human movement. Although Beck, Klopfer and Hertz 
accept this original derivation from Rorschach in toto, Rapaport, Gill & Schafer (1946) have 
accepted the same with some modifications. According to Rapaport and his colleagues. M 
should be restricted to only those responses which contain human figures—either complete or 
incomplete. Exner (1974, 74) has also accepted the symbol M for human movement but with 
some variations. On the lines of the flexion-extension division of Rorschach, he divides human 
movement into two parts—active and passive. Running, walking, jumping and crying are all 
€xamples of active human movement and sleeping, talking, smiling, and thinking are examples 
of passive human movement. For active movement, he recommends the addition of a superscript 
afor active movement and p for passive movement so that the complete symbol becomes I~ and 
M* respectively, There are some precautions in scoring for M. 

First, F should not be used when M is scored because M automatically assumes form or 
shape, 

Second, when there is obvious reason to believe that a response of human movement is not 
°Pontaneous rather than provoked by the inquiry of the examiner, M should not be scored. (This 
's held to be true for all determinant symbols.) For example, when the examinee responds to Card 
VII with “It looks like two women’, this would not be scored as M because the response appears 
lo have been given only on the basis of form (F). But during inquiry when the examiner asks “Are 
they doing something?” and if this is answered by the examinee as “Yes, the two women are 
talking” then such response should never be scored as M because the response is not 
*PONtaneous, 


a 








2820 fren Meemenn and Rewarch Methods i | . . 


7 response where content is animal tea 
el a Mit ae ee et ener. diang-atimaic eo example” 
hover en! involved i found only among “to Card M and “Two pigs playin ; MJ 
respon tile “Twn butterflies talking with each other” t . B lothalt: 
to Card VN would be scored siriply as M | 
pe MONETTE 1K G4 oui FM, which was originally 5 ee ig Kiopfer'in 19 Hh and 
| | " Rorschach himself did not attach much SENficance 1, 
freguent!, ueed by Potrowsk: and Hertz . 1 has anc Klopfer be ' 
euch respons. No other Rorschachers exceyn Piotrowsk ‘h Teo bear : ve used the 
scoring for FAG All TEPONGES involving anima! movement — es - ms ns BOIng somewhere” 
Two cats fighting” “The rat is jumping”, “The buffalo ts Boing are simply scored as FM. Lik, 
human movement animal movernent may be active or passive. Accordingly, the HUPETSC IDL 3 foe 
tn oe onacs should be added to FM. Precaution should = 
active movement and p for passive movement = 
taken in scoring FM for only those responses of animal movement which are spontaneous and 
not provoked by the examiner's inquiry or questioning. 
inanimate or inorganic or insensate objects producing a movement are scored as mori 
sugrested by Piotrowski in 1936 and incorporated by Klopfer and Hertz separately in this system. 
No other Rorschachers hawe used the scoring for m. For all types of inanimate movements. 
Piotrowski recommended the scoring of m. Klopfer and Hertz, however, disagreed with 
Protrowski and suggested a combination of the symbol F with m. When form or shape dominates 
in the perception of an inanimate movement, it should be scored as Fm and when movement 
dominates over form, the rymibol should be mF and where there is pure movement, it should be 
simply scored as m. But these three scoring Categories—Fm, mF and m—did not prove to be 
popular because they overlapped with FM or M categories. Here Piotrowski’s suggestion has 
been very popular, according to which all types of inanimate movements should simply be 
scored as m. Some inanimate movements may be active and some may be passive. Examples of 
active Movement are “Aeroplane flying speedily”, “Top spinning fast” and “Car going speedily” 
and for such movements, the superscript a should be added so that the complete symbol 
becomes m*. Examples of passive movements are ‘Ship moving slowly’, ‘Train crossing the bridge 
al a slow speed’ and ‘Moon crossing a cloud slowly’ and for such movements the superscript P 
should be added to obtain the complete symbol, mn?’ , 
Colour Determinants: Chromatic iC _ CF, FC and Cn) 


Originally, oe Colour responses having chromatic features given to 5 cards 
(Cards i, it, Vit X) into three categories: (i) those responses which are wholl based upor 
colour and are scored as C-: (ji) those responses sean sd 
shape is involved and are scored as CF: and (iii) those responses where form dominates bet 
Colour ts also a determinant. Such Fesponses are scored as FC. Thus colour responses based upon 
chromatic features are C, CF and FC and probably these are the only scoring symbol in 2 
Rorschach test where there exists the leact disagreement, among the Rorschachers. Rorschachef 
also added a tourth symbol, CC, for colour naming, which was later moditied by Piotrowski + 
Cn A detailed dew pun Of these colour SCOTINg determinants is as follows: 

C is scored tor pure colour response, that is, ior responses where other tactors except colouf 


are wrelevant A tew examples of Pure Colour 1 : 7 
™ponse are bh ie ma Bp “= Pure 
COIOUT LeSputises are however lood, ice cream, paint, et 


“ery fare in the Rorschach | . -olour response © 
mMenpreted ti indicate lack of contros Over emot Protocols. C or colour response 
are secundary Thus the responses where Colour iy dom) 


undifierentialed ate usually sured as CF A teow © 


Cu vei difficult task for the scorer because some 0! 
too. Fos example, when the examinee emphasizes that he 


be R 


these fespunses May be « ured as f 


Projective Techniques 233 


perceives “meat” because it looks like a piece of meat, then obviously it would be scored as FC 
rather than CF because here torm is primary and colour is secondary. Thus, through a nondirect 
inquiry the examiner must get himself fully satisfied that the percept has primary reference to 
colour and secondary reference to form in order to score it as CF 

FC is scored tor a response where form dominates and the examinee also makes a secondary 
reference to the colour for the purpose of elaboration or classification. A few examples of an FC 
response would be lungs, heart, a red dog, a red shirt, a brinjal, etc. In all these examples, the 
form of the percept dominates with a secondary emphasis on the colour. A part of the blot may 
look like a heart because it Is first similar in shape and then, it has red colour. However, a skilled 
examiner should get this cleared during the inquiry. Even after that if the examiner is in doubt 
regarding whether or not form dominates over colour, testing-of-the-limit may be carried out. As 
compared to CF, FC represents a very controlled use of colour. FC indicates control over 
emotional impulses whereas CF indicates comparatively little control over it. 


Sometimes it is found that the examinee simply names a particular colour and declines to 
comment further on it. Such responses in which only the colour had been named were 
recognized as colour-naming responses and were scored by Rorschach as CC. However, he did 
not attach much significance to a CC response. Piotrowski, for the first time in 1936, introduced 
the symbol Cn in place of CC and this symbol has been frequently used by many Rorschachers. A 
few examples of colour-naming responses are “Here it is pink”, “This is a red colour”, 
“Greenish”, etc. Exclamations like “Look at the colours!” or “What a nice variety of colours!” are 
not scored as Cn. 


Colour Determinants: Achromatic (C', C' F and FC') 


Black, white and grey are regarded as achromatic colours. Although these colours are not 
regarded as colours by a physicist or even by a psychologist, they are important because they are 
frequently used in day-to-day life. Rorschach himself never suggested a separate scoring for these 
achromatic colours. However, this was introduced for the first time by Klopfer in 1938 who based 
the decision to score achromatic colours separately on researches done subsequent to Rorschach 
as well as upon the viewpoints expressed by Binder in 1932. Rapaport included the same scoring 
symbols and criteria in his system whereas Hertz and Piotrowski modified the scoring symbols 
but the purpose and the criterion were those of Klopfer. 

Achromatic responses, on the pattern of chromatic responses, fall into three categories and 
are symbolized by three different symbols: (i) the pure achromatic colour response scored asC’, 
(ii) the achromatic colour-form response scored as C’F, and (iii) the form-achromatic colour 
response scored as FC’. 


The C' response is a pure achromatic colour response and has no reference of form or shape 
at all. Such a response is very rare and uncommon. A few examples of pure achromatic colour 
response are ‘coal’, ‘mud’, ‘soil’, ‘snow’, etc., and these are scored asC'. 


The achromatic colour-form response is scored as C'F and refers to a response in which 
achromatic colours are the primary determinant and the form has a vague and undifterentiated 
reference. A few examples of C'F response are ‘Black cloud’, ‘White water’, ‘Gray sky etc. A 
part of the blot may be perceived as ‘Black cloud’ most probably because the cloud is blackish 
lirst, and then because its shape resembles a cloud. A skilled examiner, however, must Confirm 
this from the examinee through inquiry before he scores it as C’ RK 

F'C represents the form-achromatic colour response in which the primary emphasis is based 
upon the form and the achromatic colour has only a secondary reference tor the purpose ol 
elaboration or classification. The decision to score a response as F°C Is relatively easier than C'F 
because of the primary emphasis upon the form features. A tew examples of the F'C response are 
‘Black dop’, ‘White horse’, ‘Gray cow’, etc. Obviously, a part of the blot may be perceived as 


i. 1 


234 Tests, Measurements and Research Methods in Bebavioural Sciences 


black dog because the blot looks like a dog first (emphasis upon form) and then because it j 


i sie * al IS bla. 
(a secondary reference to colour). However, the examiner should confirm this from the inquiry. 


Shading Determinants | 

Perhaps the least researched and the most controversial determinant category in 
Protocol is that of shading or chiaroscuro responses. A shading response ic One is Which 
light-dark features of the blot are used. As earlier mentioned, Rorschach originally recommenda, 
only five symbols for determinant categories in which no mention of shading FESPOnses Was 
made. But later, he also included a parenthesized (C) as a sixth symbol for shading responses 
Since then, some of the Rorschachers have worked over the determinant Category of shading 
responses and the net product has been the emergence of a different set of S¥Mbo|s with 
different criteria, 


the Rorschach 


Beck has divided shading determinants into three categories. The first Category includes 
those determinants in which shading produces the impression of texture and depending UPON the 
dominance of form, this category is scored as T TF or FT for pure texture or reflections 
texture-form and form-texture responses respectively. The second Category includes those 
determinants in which shading creates the impression of depth or distance, and again depending 
upon the dominance of form, they are scored as V. VF. or FV for pure vista response, vista-form 
response and form-vista response respectively. The third category includes determinants in 
which responses are based upon shading features (light-dark features) in which achromatic 
colours are also involved. Depending upon form involvement, such determinants are scored Y 
YF or FY for pure Shading response, shading-form response and torm-shading response 
respectively. 

Likewise, in 1936, Klopfer divided the shading determinants into four categories. The first 
category, which includes shading responses that tend to produce the impression of texture: are 
scored as C, cF or Fe for pure texture response, texture-form response and form-texture response 
respectively. The second Calegory includes those determinants in which shading is perceived as 
general-diffuse. The symbols used are K for shading fesponses perceived purely as diffuse and KF 
for the diffuse form shading response. The third Calegory involves those responses in which 
shading is used for vista (depth or distance), landscapes, retlections, etc. The determinant symbol 
for such responses is FK. The general demerit of the Klopter system is that it lails to provide a Clear 
distinction between FK responses and those involy ing a more diffused use of shading (scored as 
KF and k). The fourth « ategory which was added by Klopfer in 1937. includes X-ray of 
topographical map responses and depending upon the dominance of torm they are scored as K, 
for pure X-ray or topographical Map responses, KF for X-ray form response, FK for form X-ray 
responses. Likewise, Hertz also used three Categories of Shading determinants in addition © 
separate scoring for the achromatic colours. Rapaport, however, used only two categories of 
scoring determinants in addition to separate scoring tor the achromatic colours. Thus, shading 
determinants represent the widest differences among the Rorschach authorities. Despite these 
differences, Exner (1974) has been able to prepare a very comprehensive and widely-accepted 
scoring for the shading determinants. which utilizes the three basic symbols of Beck—J, Vand Y 
for the three categories of shading responses. Exner’s comprehensive system has, however 
modified the criteria for two of the three Categories of shading responses (that is, for Vand Y) 
from those used in the Beck system. A detailed discussion of T V and Y categories is presented 
below. 





Projective Techniques 235 


Texture Determinants (T, TF and FT): Klopfer was the first person to include a separate 
symbol for texture responses. His symbol was (C), Hertz later included Klopfer’s symbol in her 
system. Piotrowski (1957) does not provide a separate scoring for texture responses. Rapaport et 
al. (1946) scored texture responses only when they were given in combination with achromatic 
colour responses. It was Beck (1944) who modified the symbol of Klopfer as Tto represent texture 
responses and since then, texture responses are usually reported in the form of T by most 
Rorschachers. 


Texture is scored when shading (light-dark) features of the blot are used to represent tactual 
stimuli. The common features of tactual stimuli are cold, hot, rough, hard, soft, smooth, sticky, 
greasy, furry, silky, etc. When these words are used by the examinee, it is highly probable that 
shading is involved and texture should be scored. Texture is scored in one of the three ways 
depending upon the extent of form involvement. T is scored for a response in which shading 
features are represented as tactual-or textual without the involvement of form or shape. In other 
words, Tis scored for pure texture response. Responses like ‘Silk’, ‘Ice’, ‘Hair’, ‘Flesh’ and ‘Wood’ 
may be scored as T if shading (light-dark) features of the blot are involved and perceived as 
texture with no form involvement. A pure texture response is, however, very uncommon among 
the three texture responses. 

TF is the second category of texture scoring. A response in which shading is perceived as 
texture and form is involved only secondarily, is scored as TF. In other words, TF is scored for a 
response in which interpretation of shading as texture is primary and form features are used 
secondarily for the express purpose of elaboration or classification of the percept. Responses like 
‘A soft piece of ice’, ‘A hard metal’, ‘A rough skin’,. ‘A rough piece of sandpaper’ are likely to be 
scored as TF provided shading features of the blot are involved. 

FT is scored for the form-texture response, that is, for a response in which form is primary 
and the shading features of the blot perceived as texture are secondarily involved. Thus, in FT 
response the primary determinant is form (that is, form is distinct and clear) and the secondary 
determinant for the purpose of elaboration or classification Is the shading features perceived as 
texture. For example, responses like ‘Fur coat’, ‘A glass made of hard metal’, and ‘Smooth chair’ 
are likely to be scored as FT provided shading (light-dark) feature of the blot are involved. In all 
these examples the form is clear and distinct and hence, it is the primary determinant. 


Researches have revealed that of the three texture responses, three fourth responses are FT 
and occur with great frequency in all Rorschach’s Cards except IV and VI (Exner 1974). 


Shading-dimensionality or Vista Determinants (V. VF and FV): When shading or the 
light-dark features of the blot are perceived as representing depth and/or dimensionality, they 
refer to the shading-dimensionality or vista determinants. Rorschach made only a passing 
reference to such shading responses, which involved dimensionality. However, it was Klopfer 
who for the first time provided a separate scoring symbol (FK) for these responses In 1942. But 
unfortunately this symbol could not be popular as it was not clearly differentiated from KF and K 
'esponses, which were also introduced by Klopfer for responses involving the more diffuse use of 
shading. Following Klopfer’s lead, Beck for the first time used for these vista responses the more 
distinguishing symbols—V, VF and FV. Like texture response, vista response is also scored in one 
of three ways depending upon the extent of form involvement. Beck included reflections or the 


'esponses based upon the reflections under vista responses and scored them as FV. Klopfer has 





] ie 1 Sc i = 7ES 
; ere ae = Behavioural SCICHEC 
236 Tests, Measurements and Research Methods in 


also included such responses, for which the symbol - K has been on si his SYSTEM. Exe, 
(1974) has not included the reflections under the vista response and has provided a Separate 
scoring category for such responses by providing a vely’ distinguishing status to them, The auth oF 
therefore, accepts Exner’s view to be a more appropriate one. —— 

Vis scored for pure vista response. A pure vista response is one in which shading features are 
perceived as representing depth or dimensionality without any shi Sent 2 few examples 
of the pure vista response are ‘Deepness’, and ‘Height’, provided the shading features are 
involved. Pure vista responses are usually very rare. 

VF is the second scoring determinant for shading responses having depth or dimensionality 
A response is scored as VF if the primary determinant is the shading feature perceived as depth 
and the secondary determinant is form or shape. In other words when a response utilizes shading 
features of the blot to represent depth and also includes a secondary emphasis on form for the 
purpose of elaboration or clarification, it is scored as VF. Responses like ‘Mountain peak’, ‘Depth 
of a:lake’, and ‘An aerial view of the city’ may be scored as VF provided shading features haye 
been used. | 

FV is the third scoring determinant for the vista response. A response in which the primary 
determinant is form and the examinee provides shading features of the blot as depth only as a 
secondary interpretation, is scored as FV. Thus in FV responses, form is dominant and primary 
and the shading features perceived as depth or dimensionality are vague and secondary, 
Responses like ‘A deep well’, ‘A tall tree’ and ‘A woman behind a curtain’ may be scored as FY 
provided shading features of the blot are involved. In all these examples, form is distinct and clear 
and is the base of the response. If the inquiry fails to satisfy the examiner regarding the dominance 
of ‘form’ or depth and/or dimensionality, testing-of-the-limits is recommended. Of the three vista 
responses, FV occurs in greater frequency. Following Beck, reflections or responses based upon 
reflections such as ‘Two women in a mirror’, ‘A human face in the water of the well’ would be 
scored as FV. 

General-Diffuse Shading Determinants (Y, FY, and YF) The general-diffuse shading response 
includes within itself all those shading responses which are neither texture nor vista. As a matter 
of fact, it is this shading response to which Rorschach used the symbol parenthesized (Q). Such 
shading responses are used in a more general and nonspecific sense than those of the texture or 
vista responses. Like the texture or vista response, the general-diffuse shading response can also 
be scored in any of three ways depending upon the degree of the form involvement. 

A pure shading (general-diffuse) response is scored as Yand is defined as a response, which 
is exclusively based upon shading features whose content has no form. These shading features, 
however, must not be perceived either as texture or as vista. Responses like ‘Darkness’, ‘Ink’, 
‘Smoke’, ‘Fog’ are scored as Y provided shading features of the blot have been used. 

The shading-form response is scored as YF A YF response is a response in which shading 
features are the primary determinant and the form is a secondary determinant, that is, form is used 
only secondarily for the purpose of elaboration or Classification. Usually, the co 
response has vague and nonspecific form but the light-dark features are important for the 
formation of the content. Responses like ‘Some sort of X-ray’ and ‘Dark clouds’, can be scored as 

YF if shading features have been used. 

The form-shading response is one in which form is Primary to the formation of the response 
and light-dark features are used secondarily for the Purpose of elaboration or classification. Such 
responses are scored as FY. Responses like ‘Dark pen’, ‘Dirty shirt’, ‘Dark ship’ may be scored as 
FY if the examinee gives emphasis upon form. 





Projective Techniques 237 


Form-dimensional Response (FD) 


The form-dimensional response scored as FD is different from the form-vista response scored as 
FV, though form dimensionality is involved in both the responses. In an FV response the basis of 
dimensionality is the light-dark (or shading) feature of the blot whereas in FD response the basis 
of dimensionality is the form interpreted by the size of the blot areas. FD is a new scoring category 
introduced by Exner (1974). According to him, FD is scored for “responses which include 
perspective or dimensionality based exclusively on form, interpreted by size or in relation to 
other blot areas” (Exner 1974, 99). Responses like ‘A bird on the top of a tree’, ‘A woman lying 
down’, ‘A statue on the top of a hill’, ‘Across on the top of a church’, may be scored as FD. 


Reflection Responses (rF and Fr) and Pair Response (2) 

Exner (1974) provided a separate category for the reflection responses based upon the symmetry 
of the blot. Beck and Klopfer have both suggested the inclusion of reflection responses as one 
form of vista response. Hertz has scored reflections in two ways—reflections based upon shading 
as shading responses and reflections based on form as form responses. Exner’s scoring for 
reflection responses is the most compressible as well as easily adaptable one. Depending upon 
the form involvement, he classified reflection responses into two categories as mentioned below. 

The reflection-form response is one category which is scored as rF. An rF response is one in 
which symmetry features of the blot are primary to the formation of the response and the form is 
used secondarily for the purpose of further clarification. Usually, in an rF response, the form is 
vague and nonspecific. Responses like “Sky reflected in a pond” and “A reflection of something 
in a lake” may be scored as rF. The frequency of rF responses is very rare. 

The form-reflection response is the second category of the reflection response and is scored 
as Fr. The Fr response is one in which form features of the blot are primary to the formation of 
response and the content is perceived as reflected due to symmetry of the blot. Thus in an Fr 
response the form is distinct and clear, which distinguishes it from an rF response. Responses like 
‘A girl in a mirror’ and ‘Own face reflected in water’ may be scored as Fr because of the 
dominance of the form, which is relatively clear and distinct. For Beck, all the above types of 
reflection responses would simply be scored as FV, which is a kind of vista response. Similarly, 
Klopfer would score them as FK which is also a kind of vista response. 

Like reflection responses, pair responses are also based upon the symmetry of the blot and 
are scored as parenthesized (2). A pair response excludes the form specificity or form quality. 
Thus when the symmetry of the blot evokes the perception of twoness or pair, the symbol (2) is 
used by Exner (1974). Responses like ‘A pair of flags’, ‘A pair of shoes’, ‘A pair of crabs’. ‘Two 
bears’, and ‘Two little girls, one on each side’ may be scored as (2). However, the examiner 
should carefully note that the symbol (2) should never be used when a pair or the two objects are 


Perceived as reflected. 
Thus, it is obvious that following Exner (1974) determinants have been divided into nine 
(chromatic), colour (achromatic), texture (shading), 
imensionality or vista (shading), general-diffuse (shading), dimensionality (form) and reflections 
and pairs. There are altogether 24 scoring symbols used for them. For convenience, list of these 
symbols along with appropriate response names and categories is presented ie Table 11.2. 
Readers should note it carefully because sometimes two or More than two determinants may be 
ombined in one response. Such responses are known as blend responses which will be 
discussed in detail along with Beck’s Organizational response, OF Z, while discussing the 


r ; 
Nerpretation of Rorschach response. 


“alegories such as form, movement, colour 


_ 


Table 11.2 Symbols, category and description of the Rorschach determinants 


eoarch Methods in Behavioural Sciences 
238 Tests. Measurements and Kes arc 















Symbols Category Description 


Fonn Form response 
| | 


Human movement response 


Movement Animal movement response 
= Inanimate or inorganic moverne 


m Nt response 





Pure colour response 


CF Colour (chromatic) Colour-form response 
FC Form-colour response 
Cn Colour-naming response 
Cc Pure achromatic colour response 
C'F Colour (achromatic) Achromatic colour-form response 
FC’ Form-achromatic colour response 
T Pure texture response 
TF Texture (shading) Texture-form response 
FT Form-texture response 
a ee 
V/ Dimensionality Pure vista response 
VF Depth or Vista (shading) Vista-form response 
FV Form-vista response 
y Pure shading response 
YF General-diffuse (shading) Shading-form response 
FY Form-shading response 
A a ee en eee 
FD Dimensionality (based on form) Form-based dimensional response 
rF a Retlection-form response 
Fr Reflection and pairs Form-reflection response 
(2) Pair response 
eee eee 
Content 


A final task in scoring responses on the Rorschach test is to select 


th ‘opriate conten 
Usually, this final task & BPPloP 


is very important for interpretation of the responses and that is done 
appropriately selecting the symbols to represent the content. The selection of appropriate 
symbols to represent the content is not a very difficult task. 
Originally, Rorschach in 1921 used on| 
(human), Hd (human detail), 
objects). 


y six symbols for the scoring of content. They were H 
A (animal), Ad (animal detail), Ls (landscape) and Obj (inanimate 
After Rorschach, several scientists found that these six categories did not provide 2" 
adequate distinction among the content categories. As such, they developed their own symbols 
based upon Rorschach’s original symbols to represent the content categories in detail. The 
longest list consisting of 35 content Categories was provided by Beck (1944), and the shortest list 
consisting of 23 content categories was provided by Klopfer & Davidson (1 962). Exner (1974), !" 
his Comprehensive system, has included only 26 content categories out of which 22 are the basi¢ 
content categories and four are the parenthesized supplementary content categories. Various 





_—_ 


Projective Techniques 239 


h systematizers not only give different lengths to the list of content categories but also 
use different symbols for the same category. For example, Klopfer has used symbol At for the 
anatomy content whereas Beck has used the symbol An for the same category. Since the lists of 
content categories proposed by most Rorschachers vary widely, it is difficult to prepare a list 
acceptable to all. However, the most common categories of content, which are acceptable to 
most Rorschachers, 


Human Responses H, (H), Hd, (Hd) 

H is used for whole or nearly whole human figures. Responses like ‘Man’ ‘Woman’, or ‘Man 

having no hair on the head’ would be scored as H. (H) is used for mythological (or fictional) or 

cartoonlike whole human figures. Thus responses like ‘Giant’, ‘Fairy’, ‘Ghosts’, ‘Dwarf’ and 
id be scored as (H) rather than H. The symbol Hd is used for parts of the human 


‘Devils’, etc., WOU 
an form. Limbs, head, feet, hands, fingers, person without head, etc., 


body or the incomplete hum 
would be scored as Ha: Fare of the human body considered from the mythological or fictional 


point of view are scored as (Hd). Thus ‘Eyes of the devil’, ‘Finger of a ghost’, “Leg of a monster’ 
would be scored (Hd) rather than Hd. 


orschac 


are reproduced below. 


Animal Responses A, (A), Ad, (Ad) 

‘5 used for whole or nearly whole animal fig 
horn’ would be scored as A. When the response of 
or fictional point of view, the symbol (A) 
are scored as (A). Ad is reserved for parts of the animal body and 
he mythological or fictional angle. 
an ass’ would be scored as 
ld be scored as (Ad) 


ures. Responses like ‘Buffalo’, ‘Cow’, 
a whole animal figure !s 


The symbol A 
is used. Responses like 


or ‘Cow having no 
given from the mythological 
‘Magic horse’ and ‘Flying fish’, 
(Ad) is used for parts of the animal body considered from t 
Thus ‘Head of the buffalo’, ‘Tail of a dog’, ‘Head of a frog’, and ‘Ear of 
Ad and ‘Wing of magic horse’ and ‘Face of a lion of Goddess Durga’ wou 


because of the mythological viewpoints expressed therein. 


Anatomy (An) 
An is used for anatomy (internal organs) of humans or animals. Thus responses like ‘Lungs’, 
‘Heart’, ‘Skull’, ‘Kidney’, and ‘Intestine’ (of either human being or animal) would be scored as An. 


Klopfer has used At in place of An for anatomy content. 


Nature (Na) 

Natural objects or things are scored as 
waterfall, storm, night, thunderstorm, fog, 
Na. Clouds, are, however, excluded from this content ¢ 
separate symbol Cl is provided (Exner 1974). 


Na. A few examples of the natural objects are rainbow, 


mist, and so on. All these responses would be scored as 
ategory because of its importance anda 


Botany (Bt) 
Objects representing plant life such as flower, fruit, tree, bush, and so on are scored as Br. 


Blood (B]) 
The symbol B/is used for the blood of either human 
Art 


alntings, family, crest, and the seal of authority are wi itte 


beings or animals. 
nas Art in content scoring. 


Clothing (Cg ) 
hat, blouse, sari and hooks are 


Clothi ; 
ato materials associated with human beings such as trouser, 
“Cas Cg. Clothing materials associated with mythological characters such as a ghost’s 


trous | 
e's or a witch’s hat are scored as parenthesized (Cg). 


havioural Sciences 
Tests, Measurements and Research Methods in Behavioura 
240 3 Jiests SF 


Food (Fd) 


All responses representing food substance or edible 


objects are scored as Fd. Responses like 
apple, iried A hic ken, CRs, trie 


d fish or meal, etc., are scored as Fd. 
Fire (Fi) 


) . . _ i ire, flame coming out from a stove 
Responses like burning, electric bulb, burning candle, actual fire, flame com = ironed aoe 
or torch, etc., will be scored as Fj. In the determinant category, these oo ion or blast’ anc 
because they denote iNanimate movement. Responses like atomic explos 


however, BIven a separate content symbol, that is, Ex, which denotes explosion. The dete: minant 
Calegory will continue to be m. 


Household (Hh) 


Interio 


Landscape (Ls) 


Precepts involvin 


g landscapes or seascapes 
Barden, an aeria| 


are scored as Ls. Thus res 
view of the city, underwater s 


ponses like beautify] 
cenes are scored as Ls. 


Sex (Sx) 


Responses representing sex Organs or sexual functions ar 
Vagina, sexual intercourse, breast, testi 
X-ray (Xy) 


Responses involving percepts of X-ra 
heart, X-ray of the int 


© scored as Sx. Responses like penis, 
cles, menstruation, and sO On are scored as Sx. 


Y are scored as Xy. Thus X-ray of the bones, X-ray of the 
estines, etc., are scored as Xy. 


Popular and Original Responses 


The popular or Pp responses are those responses which o 
protocols. Originally, Rors 


chach (1942) made no mention of the popular responses in his work. 
However, he recognized the importance of such responses in his posthum 


ously published Paper 
lled these Popular responses vulgar responses. Accordi 


ing to him, a P response 
rschach set up an objective criterion in 
ing to him any response to be recognized as a P response, should 


-in-three criterion of Rorschach has been 

i , have developed a list of 

‘periences using the one-in-three criterion. 

Rapaport et al. (1946) define >Pponse as that which Occurs at least once in every four or 

five protocols. Piotrowski (1957) has recommended the scoring of P as responses occurring at 
least once in every four pr 


otocols. Hertz (1970) defined the P respons 
at least once in every six p 


© as a response that occurs 
rotocols, thus using the broadest Criterion, 
Setting aside some minor variations, among Rorschach authorities | 
Piotrowski, Rapaport, Klopfer, Exner and Beck, the Popular responses evoke 
may be enlisted as shown below. 


ccur frequently in the Rorschach 


ike Hertz, Mons, 
d by the ten cards 


Card I: Bat or butterfly (VW), human figure (D) 

Card il: Human figures (W/, butterfly (D), animal forms, usually the heads of dogs or bears 
(Wor D) 

Card lil: = Two human figures or single human figures, and dolls (W), butterfly (D), fish (D) 

Card!V: Animal skin ora human figure covered in animal skin (WA, shoe or boot (D) 

Card \V- Moth, vulture, e 


agle, bat and butterfly (W), rabbit (D), leg (d) 





— 


Projective Techniques 241 


Animal skin (W) 


card VI: . 
card VI: Head or face of awoman or a child (Wor D) 
card VIN: Animal figures commonly perceived as wolf, fox, coyote, dog and bear (D), tree, 
bush (D) 
Card IX: A head or face of a male (D), camel's head (D), tree (D) 
Card X: Many-legged creatures such as crab, lobster, spider and rabbit's head (D), dogs (D) 
Rorschach (1942) had also suggested scoring for the original response. He defined the 


sponse as a response that occurs no more than once in one hundred protocols. 
al responses are rare and creative responses. On the basis of the form quality, the 
ded into two categories. An original response having a clear and distinct form Is 
eas O- is scored for a vague and indistinct original response. Almost all 
Beck have included the scoring of the original response in their system but 
each of them has provided a word of caution. Piotrowski (1957) has said that the scoring of the 
original response Is highly subjective and should be accomplished only by some trained 
Rorschachers. Likewise, Rapaport et al. (1946) have suggested that the decision to score the 
original score is a very difficult task and only the trained Rorschacher should attempt it. Likewise, 
Klopfer & Kelley (1942) consider the scoring of the original response as “a hopeless enterprise” 
because the number of such responses is unlimited. 


INTERPRETATION OF THE RORSCHACH PROTOCOL 


The interpretation of Rorschach data is a complex task and requires considerable training, skill 
and experience. Perhaps, this is the reason why there is no complete consensus regarding the 
interpretation of the data even among the prominent Rorschach authorities. However, inthis 
to present the interpretative significance of Rorschach data in 
which beginners may find profitable. Throughout the 
also known as 


original or O re 
Therefore, origin 
O response is divi 
scored as O", wher 
Rorschachers except 


book an attempt has been made 


the simplest possible form, 
interpretation of Rorschach responses, the method of content analysis ( 

document analysis) has been assumed to be the correct method of analysis. 
According to Rorschach, the total number of responses or K varies between 15 to 30 fora 
normal examinee. Klopter, however, estimated R to vary between 20 and 45 and Exner (1974) 
estimated R to vary between 17 and 27 under his ¢ omprehensive system. Thus the minimum total 
Kor the maximum total R varies. A minor deviation trom the above range does not necessarily 
indicate psychopathological traits. However, tl should be explained with caution. A very low 
total R such as 10 or below indicates rejection by the examinee, that is, he has rejected one or 
are cards of the test and the skilled examiner should make a careful note of it because it may 
indicate defensiveness, intellectual limitations, depression and organicity. However, none of 
these traits can be wholly accepted or rejected solely on the basis of the total R. Other factors to 
appear later should also be taken into account. Likewise, a very high total R such as 90 or 100 

indicates schizophrenic tendencies (Piotrowski 1957). 

OF every Rorschacher has emphasized the reaction time at the scorable response in 
Sieeuee es as the total time elapsed | during the free association. Piotrowski (1957), 
time with a ed mat Rorschach never used a stopwatch and he never recorded the reaction 
down a precision, though he noted down a longer pause. The general purpose of noting 
colour carde acim was to make a comparison of the average reaction time to the chromatic 
Nand vip “9 UL, Vill, \X. and X) with that of the achromatic colour cards (Cards 1, IV, V, 
Colour-shock uch comparison may reveal the presence or absence of what is called the 
, which is indicated when the average reaction time to chromatic cards is higher 


than th 
“ average reaction time to achromatic cards. 





SS 


242 ~=«Tests, Measurements a nd Research Methods in Behavioural Sciences 


A detailed discussion of the interpretative significance of the different 
four categories of scoring, that is, location, determinants, 
responses is presented below. 


symbols used | 
content as well as popular and o N the 


riginal 
Location Scores 
All location scores, in general, indicate how the examinee approaches his enyj 
particularly the ambiguities or vagueness pert ronment, 


aining to it without answering why. | 
x . : E : In 
the location scores indicate only the manner in which the examinee Pi t other words, 


ae | ee 2 ae 
ot his world without answering why he prefers this manner or that manner. A content ig: 
the various location scores is presented below. | analysis of 


Of the various location scores, the W score is probably the most frequently and wid 
investigated location score. W responses are given in greater frequency for Cards pe Widely 
because these cards are more solid and represent the form of unbroken blots Rorscha pes VI 
had revealed that W has a direct relationship with intellectual ability and capacity ae 942) 
things or objects in a meaningful and coherent whole. A review of the literature (eieilethet 
revelation of Rorschach has aroused some controversy because some researchers have failed “ 
show a consistent relationship between W responses and mental ability. . 


: For example, Amirta 
et al. (1955) reported a very low correlation between W responses and vis - 


IQs. Likewise, Lotstaff 
(1953) reported that W responses are related to verbal fluency only but not to the overall 
intelligence. Wittenborn (1950) found no relationship between W responses and measures of 


intelligence. Because W responses are related to intelligence only for some specific cards, there 
is low correlation (about .40) between the number of W responses and IQ scores (Kaplan & 
Saccuzzo 2001). The proportion of W is also an important clue. Exner (1974) reported that the 
ratio of Wto D (W:Dratio) for anormal adult should be 1:2. If the D side of the ratio is elevated, 
it is ordinarily concluded that the examinee selects the easier perceptual-cognitive mode to act 
when faced with ambiguity. On the other hand, when the W side of the ratio is elevated, it 
indicates that the examinee reports to excess in a bid to organize an ambiguous situation. Wis 
also interpreted with reference to the human movement determinant, that is, M. The average 
normal adult ratio of Wto M, that is, W:M (both indicative of intellect) is 2:1 to 3:1. If W exceeds 
the 3:1 ratio, it indicates that the examinee has a high aspiration level, which normally exceeds 


his ability level. On the other hand, if W falls below 2:1 it indicates that his aspirations are below 
the level of his ability. 


The D response, according to Rorschach, indicates the ability to perceive and react to clear 
and distinct characteristics of the world. Generally D responses are easier to give than W 
responses because they represent the easiest perceptual cognitive mode to act when faced with 
ambiguity. A review of the literature reveals that Card X produces the largest number ol D 
responses because of its unbroken features. Usually, in a protocol the number of D responses is 
higher because D responses are easier to give than other location responses. Persons having 


stress and anxiety give only a few D responses. Even maladjusted persons yield a low proportion 
of D responses. 


The Dd response is interpreted as a form of respite from the vagueness of the mame - 
larger areas of the blot. For a normal adult, the proportion of Dd as compared to W an id 
responses is very low. It occurs at the rate of 5% in the protocol of a normal adult, that ‘ : : 
response out of 20 is Dd for a normal adult. When the amount of mental disturbance is hig | at 
the case of schizophrenics or obsessive compulsive neurosis, the percentage ol Dd may ne 
even 30%. Mere presence of the Dd response should not be interpreted as being an un age? 
sign because when it Is present in appropriate ratio along with W and D, it oo eh - 
initiative Capacity and the capacity to withdraw. The research findings — 0 ie 
response was positively related with internal adjustment and was negatively related with © 
adjustment. 


Projective Techniques 243 


The S response, according to Rorschach (1942), Beck (1945) and Rapaport et al. (1946) 
andicates negativism or oppositional features. Klopfer et al. (1954) has, however, interpreted the S 
response to be the indicative of ‘constructive self-assertiveness’ provided such responses do not 
occur disproportionately in the protocol. A review of the literature supports the above 
conclusions with a few contradictions. Bandura (1954) reported that the S responses are 

ositively related with ‘oppositional’ or negativistic tendencies. Murray (1974) has, however, 
found no such relationship between these two variables. Rapaport et al. (1946) reported that 
paranoid patients showed the highest frequency of S responses. 

Lastly, the three confabulatory types of answers, that is, the DW, the DdW and the DdD 
responses, are also important interpretative elements. These are a highly unique form of response 
and are very rare. According to Rorschach, such confabulatory types of location responses 
‘adicate intellectual constriction and distorted perception. All Rorschachers except Beck have 
accepted this view of Rorschach and have considered the confabulatory responses to be 
uniformly pathological ones. Beck (1945) has shown that these confabulatory types of responses 
are given by a normal adult specially when his intellectual capacity is superior. Following Beck, 
such confabulatory responses indicate only a logical cognition (Exner 1974). 


Determinant Scores 

Determinant scores indicate psychological action, which forms the basis of a response. In other 
words, they indicate what psychological action underlies a particular response. A content 
analysis of the various determinant scores is presented below. 

The F response (or the pure form response) is the most common determinant scoring in the 
Rorschach test. According to Rorschach (1942), a pure form response Is related to the examinee’s 
thinking or reasoning and indicates the attention—concentration features of human thinking. In 
other words, it indicates adequate affective delay or control necessary for the occurrence of the 
form response. Beck (1945) and Klopfer et al. (1954) have agreed with Rorschach but with an 
additional caution. They caution that the /ocus of the pure form response Is indicative of 
constriction or defense and emotional conflict may be present there. But through deliberate 
thought operations, the conflict is somehow controlled. According to Rapaport et al. (1946), the 
pure form responses indicate a process of formal reasoning and the ability of the examinee to 
direct his attention and thinking to the elements of control and his ability to make discriminating 
judgement. Such examinees also show regard for the standards of the environment. Most of the 
Rorschachers have compared the pure form responses with the nonpure form responses for 
extracting adequate meaning from the pure form responses. One such comparison is done 


——. Recently, Beck (1945) has argued for another method of 
comparing pure form responses with nonpure form responses. His method provides for the 
Lambda index. One advantage of the Lambda index over F%, according to Beck, is that it avoids 
the problems involved in interpreting the percentages (in F%), specially when R varies. The 
Lambda index is calculated by dividing the sum of pure form determinants by nonpure form 
determinants. Hence, 


through calculating F% = 


ZF 
Eee 
Sum of non-F determinants 


. The range for an average normal adult lies from 0.50 to 1. When the Lambda index exceeds 
|, it indicates excessive affective constraint and when it falls below 0.50, tt indicates affective 
Instability. Several research findings have revealed, in general, that when the examinee is in a 
State of defensiveness, the number of pure form responses |s increased, thus supporting Beck and 
se Position. A low frequency of pure form response OCCUr® when the examinees are 

Ucted to respond very quickly because this deprives them of the delay necessary for the 


Lambda index = 


> 





5 f ‘al Sciences 
esearch Methods in Behavioure 
244 Tests, Measurements and Research M , 


: alc ane ed that intoxication 
tion of the pure form answer (Hafner 1958). It has also been ai 5 phy On ten 
Vl mY Pats : : | - - spc Ses | | 
> ine rease the frequency of pure form responses but the quality O : = oetng educed 
Seanise, it has also been demonstrated that psychopaths tend to give fewer pure | eSponsec 
than alcoholics. 


The pure form response has also been interpreted in terms of is quality, — IN terms of 
good form (F" ) or poor form (F >), According to Beck (1 aS); the F° response in <—e that the 
examinee has respect for reality whereas the F~ response indicates that he has no TESPECt for the 
reality of the environment. Some researchers have demonstrated a relationship between FS 
responses and the intellect. Klopfer & Kelley (1942) reported a significant Positive Correlation 
between a lower percentage of F* responses and mental retardation. In one of their early Studies 
Beck & Molish (1967) reported a high correlation between the lower percentage of F responses 


and limited intellectual endowment. Ft responses also indicate the ability to deal WIth Stres. 

effectively (Goldberger 1974), Some Rorschachers have argued for the calculation of an Fre 

which is equal to I For a normal adult (who is neither disturbed nor of low 
(SF*)+(2F-) 


intelligence), F* % should be above 80%. F* Yo exceeding 80% is taken as indicative of the fac) 
that the person has firm control over his intellect and behaviour. 
Movement determinants 


(M, FM and m) are the second im 
the Rorschach test. Out of the above three movement deter 
investigated one. According to Rorschach, M responses 
‘internalization’, that is, 


they indicate the ability of the examinee 
and sophisticated inner experience ( 


portant interpretive elements in 
minants, M is the most 
indicate the phenom 


s to handle 


marked by organization and reasoning) in a way that they 
can be controlled emotionally. Thus M responses indicate ideati 


onal type of thinking, which 
8rows with cognitive maturation. Klopfer et al. (1954) and Hertz (1974) have argued that M 
responses indicate a functional relationship between the inner world and the external world of 
reality. Beck (1945) suggests that M responses indicate awareness towards the external world and 
reflect some conflicts or emotions which do not get obvious expression j 
Thus the three prominent Rorschachers, namel| 
M responses indicate the phenomenon of internatio: 
of them regard M responses as bri zing the inner world with the realit 
Piotrowski (1960) and Rapaport et al. (1946), while agreeing with Rorsc 
slightly different position. Piotrowski argues that M responses indicate a 
and not easily modifiable tendency to maintain a uniform attitude 
Rapaport holds that M responses indicate a type of delay of 
cognitive functions in a bid to give a more deliberate response. 


WI dely 
€na of 
the more deliberate 


hach, have maintained a 


sort of deeply embedded 
in dealing with others. 
response occurring due to some 
Thus, M responses are indicative 
"pontaneous response, the result of which may be either 
internalization or externalization, Recently, some researches have been conducted in a bid to 
correlate M responses to different overt and covert behaviours. Dana (1968) conducted one study 
In which M responses were examined jn portant constructs—fantasy, time, 
estimation, intellect, Creativeness, and interpersonal relations, His observation was that 
responses may be related to any or all of the above six behaviours. Several studies have 
| ween intelligence and M responses (Abrams, 1955: 

K Sommer & D Sommer 1958). Cocking, Dana & Dana (1969) have reported that M response: 
are related with intellect, fantasies, and time estimation. M responses have also been interpretes 
in terms of other responses on the test, Especially the colour-responses, that is, in terms of tht 
ratio of sum M to the Weighted sum C. A detailed discussion of this ratio appears later in thi 
chapter in the context of Erlebnistypus (EB) or the ‘experience balance’—one of the five variabl 
introduced by Rorschach in the interpretation of protocol, 





Projective Techniques 245 


FM response (or animal movement response) is another important determinant much 
emphasized by Klopfer, Hertz and Piotrowski. According to them, FM responses indicate 
jjmpulsiveness’ of a person, that is, the trait of a person in which he is more guided by need for 
immediate gain or gratification than for long-term goals. The trait of impulsiveness is more 
common among children and aged persons than among adults. Accordingly, it has been reported 
by Klopfer & Davidson (1962) that the frequency of FM as compared to M is high for aged persons 
and Ames (1960) reported that in child protocols, the frequency of FM exceeds the frequency of 
mM. Some independent studies have been conducted to examine the relationship between the FM 
responses and the correlates of impulsiveness such as aggressiveness, distractibility, 
irresponsibility, defensiveness, etc. In her study, Hann (1964) demonstrated that FM responses 
are related with the different measures of defensiveness such as substitution where overt 
expression of impulse is found, and with rationalization, regression and internalization where the 
impulse does not get an overt behavioural expression. Sommer & Sommer (1958) have found that 
FM responses are significantly correlated with aggressiveness and assaultiveness. Thompson 
(1938) demonstrated that irresponsibility, aggressiveness and distractibility as measured by 
MMPI highly correlated with the preponderance of FM responses. Berryman (1961) in this study 
of creative artists found that FM responses were positively related to their level of productivity. 

The mresponses (or the inanimate movement response) indicate impulses or thoughts which 
are ‘threatening’ to the stability of the personality and are usually beyond the control of persons 
(Klopfer et al. 1954). Hertz (1942) and Piotrowski (1957) interpreted the m responses in slightly 
different ways. While they did not consider such responses as threatening, they took them as 
representing impulses or thoughts which are not properly integrated into the cognitive functions. 
No other Rorschachers, except the above three, have included the scoring for m in their systems. 
The most common interpretation of m responses among the three Rorschachers is that such 
responses are associated with frustration occurring due to imperfect interpersonal behaviour. A 
few researches regarding m responses have revealed something more than what has been said 
above. Majumdar & Roy (1962) have reported that m responses Occur in greater frequency 
among juvenile delinquents. Neel (1960) in her study has demonstrated that m responses 
increase with an increase in inhibitions, tension and conflicts, commonly created by the failure 

to integrate needs with behaviour. The active-passive dimension of the three movement scores, 
M, FMand mis also of some interpretive significance. Passive movements of the three types, In 
general, are related to depression and active movements are taken to be indicative of character 
disturbances, 

The chromatic colour determinants (C CF, FC and Cn) are of next important interpretive 
significance. According to Rorschach (1942), these chromatic colour responses indicate 
affectivity or emotional excitability. C, Cn and CF responses in which colour-responses are 
dominant, indicate affectivity which has no capability for adaptiveness. FC responses, in which 
the form rather than the colour is dominant, indicate affectivity with the capability for 
adaptiveness. This is known as Rorschach’s colour-affect hypothesis. As interpreted above, this 
hypothesis holds that when the proportion FC to CF + C (that is, FC : CF + ©) is high, the 
examinees possess the degree of control (capability for adaptiveness) during the affective or 
emotional state but when the reverse is true, it indicates that the examinees do not possess the 
degree of control during the affective or emotional state. In calculating the above ratio, the 
irequency of Cn determinants is included within the C determinant. There are two controversial 

sues relating to Rorschach’s colour-affect hypothesis. The first issue relates to the concept of 
a This concept was introduced by Rorschach himself and was defined as the startle 
wee reaction to colour blots. The existence of colour-shock was 0, mn on the - of 
bs rigs like longer reaction time to chromatic colour Cards (ll ul VIII, | an X), ina } . ty . 
a to the colour cards, exclamation, etc. Colour-shock indicates anxiety neurosis an 
usly impaired ability to respond, which is produced by the emotional disequilibrium due to 


246 Tests. Measurements and Research Methods in Behavioural Sciences 


affect produced by colour. The second issue of colour-affect hvpothe<i< i< relat 
; “e 41 ypothesis is related to what Klop; 
& Kelley (1942) have called 8-9-10% and Beck et al. (1961) has called the ‘affective i 


a as Afr. The affective ratio is equal to ER(VIII + IX + X) > ERI to VII). Klopfer 
pa aan ace sin : n terms of percentage. The range of the affective ratio for a norma! ad ah 

: J.75. The affective ratio below 0.55 indicates withdrawal Of nacc%:: 

towards affective stimulation and the affective ratio above 0.75 indicates that th Passiy; 
uncontrolled tendency to be Caught up b wien — “ ee e Person has an 
related to the proportional number of re Y allective stimulation. In other words, this issue is 
ete: | = o* Fesponses to the last three cards—VII, IX, X, which are fully 
chromatically coloured. Both Klopfer and Beck have agreed that when the p 7 
Proportion of responses or R to the last three cards is high the examinees are ae “ 
responsive to affective stimulation. B hae See ee EE ca 

= stimulation. But when the converse is true, it indicates that the examj- 

are withdrawn from affective stimulation. This simple revelation has, however Rip eos 
— by researchers like Meyer (1951), Perlanan (1951), Dubrovner, Von bill cae pr 
mean grercers have made corelatonl suis o cremate colduesene 
LUNES heeke : t revelations. Gardner (1951) has obtained a significant positive 
have dernctenratiil Shag cheat? — et Dance wks inpateiveness: Kabler & Steil (1953) 

| coi romatic colour-responses are not related to all kinds of affects. F 
example, in depression such responses tend to almost disappear. OF 
Weigel 1941) has dembrst ated hoe cera amore chromatically coloured response 
indicates some kind of organic involv n — = amage. Piotrowski has reported that Cr 
halt Gis iSaesneees ween. M : ement and thus has supported Weigl. It has also been found 

given by persons with diencephelic lesions. 
The achromatic colour responses (C’, FC’, and C’F). based upon the white-grey-black 
tench the stapled oe poe hoe es ai 
the external world, that is, the affect is not ex _ as de a _ SeprESS directly to 
oy 5, U tis xposed directly to the world (Klopfer & Kelley 1942). 
since the affect is not exposed directly, its behavioural consequences are not so obvious as when 
the affect is not suppressed (usually reflected through chromatically coloured cards). The most 
CommMon experiences associated with C' responses are tension, constraint and/or pain, all or any 
of which may lead to disequilibrium to cognitive stability. Rapaport et al. (1946) considered the 
C' responses to be indicative of conscious and deliberate contro! or defense apainst affective 
expression to the external world. Piotrowski (1957) has agreed with Rapaport and Klopter in the 
interpretation of C’' responses. However, he has added that such responses may also indicate 
euphoric character, especially when white or light gray features ot the blot become the basis of 
such a response. Klopfer, Rapaport and Piotrowski also agree that when the basis of C’' responses 
are black and grey features and black-grey-white features, they tend to indicate depression. 
When form dominates over C’ responses (scored as FC, it indicates that the impact of affect on 
cognitive stability is less intense and well-controlled, but when form plays a secondary role and 
C' dominates (scored as C'F), it indicates that the impact of affect on cognitive stability ts 
disruptive and uncontrolled. 


The shading responses are the next important determinants and hence of next Important 
interpretive significance, As discussed earlier, shading responses are of three types—texture (T, TF 
and FN, vista (Ve VE FW) and general-dittuse () YF and FY). Of these three, the texture 
determinants are the most popular ones. According to Hertz, texture responses indicate 
cautiousness in establishing atlectively rewarding relationships with the external environment 
Klopter et al. (1954) have argued that the texture responses represent affective need. Beck (1945) 
suggests that texture responses indicate paintul affective experience, which IS usually related to 
infantile needs, particularly the erotic needs ot the infantile period. When form dominates “s 
texture responses (scored as FT), it indicates that the person is able to control his affective needs 


- 


Prifonpe [ec niques 247 


ind can Use ther to his own advantage (Heck 1945). Klopfer a al. (1954) agree with Beck ana 
have further suggerter Ithat PP repor aa indicate “the awareness and differentiation of 2 (TING S 
need for aller hon and dependency, | Where texture dominates “fy thiat form or shape plays only Z 
,oc ondary role (ot ored as FD), indicates that the painful affective experience is disruptive and 
nleneriny, with proper interpersonal hehaviour (Klopfer et al, 1954; Beck 1945). 

The vista responses (gymbolized as V, VE and FV by Beck) are the shading respons based 
nensionality, Sue h responses are very rare and their absence from the protocol, si 


as a healthy sign (Exner 1974). Beck (1944) suggests that vista responses 
are also 


ypon dir 
eneral, he regarded 


a painful feeling tone in which depression of affect and inferiority feelings 


indicate , | ) | 
involved, Klopfer et al. (1954) interpret shading responses based upon dimensionality for vista 
responses) 3% representing, the persons’s ability to deal with anxieties and conflicts objectively. 

d a tew ot 


limited number of studies on behavioural correlates of vista responses an 
vealed some important facts. Meltzer (1944) has demonstrated that stutterers give 
sponses than nonstutterers, Likewise, vista answers occur in greater frequency 


There are a 
them have re 
more Vista re 
among alcoholics than among psychopaths (Exner 1974). 


The diffuse-shading responses (scored as Y, YF and FY by Beck) are the third category of 
shading responses and are more cornmon and frequent than the texture or vista responses In a 
protocol. The diffuse-shading responses, In general, indicate anxiety and withdrawal frorn the 
environment or passivity. When diffuse-shading responses are wholly related to anxiety, it Is 
commonly referred to as shading-anxiety hypothesis, and when diffuse-shading responses are 
wholly related to the tendency for withdrawal, it is referred to as shading-passivity hypothesis. 
Klopfer et al. (1954) argue that such responses indicate a free-floating anxiety against which the 
to build any defences. He, thus, supports the shading-anxiety 
hypothesis. Rapaport et al. (1946) have supported Klopfer’s interpretation. Beck (1945) suggests 
that the diffuse-shading responses indicate a general withdrawal tendency from the environment 
and thus, support the shading-passivity hypothesis. He further suggests that Y and YF responses 
indicate the experience of an extreme form of withdrawal from the environment leading to 
as FY responses indicate that the experience Is a mild form 
of withdrawal or passivity leading to a simple delay in responding. There are a number of studies 
supporting the shading-anxiety hypothesis and the shading-passivity hypothesis, the latter being 
supported more consistently than the former. Eichler (1951) has found that when the subjects are 
put under an experimentally induced stress situation, the number of diffuse-shading responses 
sharply increases. Levitt & Grosz (1960) have demonstrated that hypnotically induced anxiety 
diffuse-shading responses. Other researchers have failed to 
support the shading-anxiety hypothesis. Holtzman, Iscoe & Calvin (1954) have demonstrated in 
one study that there is no relationship between the diffuse-shading responses and the scores on 
the Taylor Manifest Anxiety scale. Goodstein (1954) and Goodstein & Goldberger (1955) 
reported a similar finding. Elstein (1965) has extended support to the shading-passivity 
hypothesis, which favours Beck’s view. He found a positive relationship between the 
diffuse-shading responses and the tendency towards withdrawal. 

The form dimensional response (FD) is relatively a new scoring category introduced by 
Exner (1974), FD responses tend to indicate internalization involving seli-inspection or 
self-awareness. Persons giving such responses are prone to be sensitive and self-critical. 

| Reflection (rF and Fr) and pair (2) responses were separately scored and analyzed by Exner 
'1974), His analysis has revealed that reflection and pair responses indicate egocentricity or 
self-centredness. He has given a formula, which estimates the egocentricity of a person: 


individual remains unable 


complete inability to respond, where 


tends to increase the frequency of the 





Egocentricity Index =3, + (2):K 


248 «Jest, Measurements and Research Methods in Bebavioural sctences 


where rincludes both rf and Frand R means the total number of responses on all the ten cards, 


Hence, egocentricity index = 3(rF + Fr)+ rt The range of the above ratio tor a normal adult falls 


between 0.30 and 0.40, If the ratio falls below 0.30, it indicates a poor self-evaluation due to 
excessive concern about others and values of the world, If the ratio is above 0.40, it indicates 
excessive values to the self even at the cost of others and values of the world. In one study of 
homosexuals, character disordereds and depressives, he demonstrated that homosexuals and 
character disordereds gave more reflection responses than depressives did. Likewise, 
homosexuals gave a larger number of pair responses as compared to other groups, whereas 
depressives gave a fewer number of pair responses as compared to other groups. 


Content Scores 

Content scores also provide very important interpretive clues regarding the person’s needs, 
interests, interactions and about his various preoccupations. Of the different content scorings, 
probably the three content scorings, that is, animal, human and anatomy have been widely 
studied and the evidences regarding them are more precise than evidences on other content 
categories. Several studies have demonstrated that the animal contents, that is, A and Ad, 
represent the single major class of response (Wedemeyer 1954; Neff & Glaser 1954). Animal 
contents also tend to be influenced by age—children giving the largest number of animal 
contents and adults giving comparatively a low number of animal contents. In general, an excess 
of animal contents or higher A% (that is, (A+ Ad) x!) indicates intellectual constriction and/or 
emotional disturbances. Normally, A% should range from 35% to 45% of the responses. A low 
A% tends to indicate the idiographic character of the person who perceives the objects of the 
environment in his own peculiar way. Human contents, that is, H and Hd are the second largest 
single class of response. Human contents also tend to be influenced by age—children giving 
fewer human content responses and adolescents as well as adults giving a larger number of 
human contents. A high frequency of H responses indicates better cognitive development and his 
potential for good relations. A low frequency of H responses, on the other hand, is usually 
associated with delinquency (Ray 1963). Klopfer et al. (1954) argue that the high frequency of Hd 
answers is usually associated with compulsive behaviour and indicates that the subject's 
preoccupations hinder his interpersonal relationship with others. Some Rorschachers have 
argued for interpretation in terms of an H:Hd ratio. For a normal adult, this ratio falls in the range 
of 2:1 to 3:1. Reversal of this ratio (that is, when Hd exceeds H) indicates depression, anxiety, 
constructive defences and intellectual limitations. Content scores like (A), (Ad), (H) and (Hd) tend 
to indicate excessive passivity or withdrawal from the general world of reality into a world of 
fantasy. In interpretation of animal and human contents, the two ratios are frequently computed 


and compared: 
Firstratio =H+Hd:A+ Ad 
Secondratio =H + A:Hd + Ad 


(These ratios also include (H), (Hd), (A) and (Ad) responses.) 

Commonly, the first ratio tends to fall in the range of 1:2. Any reversal or imbalance of the 
ratio tends to indicate impairment of cognitive development and effective social realtions. The 
second ratio, suggested by Klopfer, gives emphasis to the occurrence of the whole vs detail. It 
normally falls in the range of 3:1 to 4:1, which indicates ability to perceive things or objects in 
their right perspective. Reversal in this ratio tends to indicate excessive concern for details often 
associated with some kinds of pathological traits. The above meaning of the two ratios stands 
valid and correct only when (H), (Ha), (A) and (Ad) do not dominate or occur at great frequency. | 
such parenthesized responses occur more than the nonparenthesized responses, the rattos 
become meaningless and parenthesized responses should be treated separately to convey thell 


appropriate meaning. 





Projective Techniques 249 


Anatomy responses also occur frequently in the Rorschach protocol. In general, anatomy 
responses indicate excess preoccupaton with bodily concerns without any physiological illness. 
The person releases his affect or motions towards the stress through these preoccupations. It has 
been reported that psychosomatics usually give more anatomy responses than neurotics (Shatin 
1952). Avery close associate of the anatomy response is the X-ray response and it also tends to 
indicate the same form of preoccupation. But the difference is that the preoccupation revealed by 
the X-ray response is more intense and painful because the reactions of the persons towards the 
stress are not released, rather kept concealed or suppressed. 


Other content scores have been meagrely explained. Sex and blood responses tend to 
‘ndicate sexual or aggressive acts. Nature, cloud and botany contents indicate emotional 
deprivation. Responses like fire, landscape, houseold, art, clothing, etc., tend to indicate various 
kinds of preoccupations causing interference in affective adjustment. 


Popular and Original Responses 


According to Rorschach, Popular or P responses indicate the capability for conventional 
thinking. Beck and Hertz have demonstrated that the high frequency of P responses indicates 
intellectual superiority, whereas the low frequency of P responses indicates intellectual 
retardation. The range of P responses for a normal adult lies between 5 to 8. A frequency of P 
responses exceeding the upper limit (that is, more than 8) indicates obsession-compulsion and/or 
depression. On the other hand, the frequency of P responses falling below the lower limit (that is, 
less than 5) indicates inability to perceive the most common things (which are usually perceived 
by others), aloofness from the general world, and inability of conformity in thinking. Original or 
Oresponses may also indicate creativeneness or inventiveness of the subject (Exner 1974). 


Blend Responses and Organizational Activity 


A blend response is one where two or more than two determinants have been used in the 
formulation of a response. In scoring a blend response, all the determinants constituting the 
blend are entered and separated by a dot (), such as FM-T, which means that the response 
contains both the animal movement and the pure texture. Likewise, the blend may include three 
determinants such as M-C-V-, which means that the response contains human movement, pure 
colour and vista. The majority of the blend responses, however, consist of two determinants. It 
was Beck who for the first time in 1937 introduced the scoring of the blend responses with the 
help of a dot (). Readers should note that the scoring symbols for determinants in the blend 
answer are the same as would be given if any of the determinants were scored separately. Each 
determinant in the blends is noted in its original symbol, and determinants which are included in 
blends are not counted again when the frequencies for each of the single determinant are being 
counted. The blend response is usually interpreted to indicate complexities in psychological 
activities at the time of responding. According to Exner (1974), such blend responses indicate a 
higher level of intelligence. When the frequency of blend responses is very low, it indicates 
Undue constriction of the psychological processes’, A blend response is compared with the total 
number of responses, that is, R in the protocol. If the blend response exceeds one fifth of R, the 


Frequency of such a response is taken to be high and if the number of blend responses is below 
aS fifth, the frequency is taken to be low. 
7 peanizational activity is another important response on the Rorschach test, though 
rsc 


Gees — himself did not provide a formal scoring for such response. In 1933, Beck became the 

— to introduce a formal scoring for organizational activity under the symbol z. The 2 

ability = is the extension of the concept of whole (W). It indicates creative ability, that is, 

whee tee perceive and create new wholes. He divided organizational activity into three 

iy be depending upon the type of organization and the complexity of the stimuli, each 

OWeijex fae weighted differently. The symbol for all types of organizational activity was, 
cme same Z. The three types of organizational activity are as under: 


7, Ee 





250 Tests, Measurements and Research Methods in Behavioural Sciences 


1. An organizational activity which meaningfully integrates two or more adja 
detail areas Cen 

2. An organizational activity which meaningfully integrates two or more nonadi 
detail areas a 

3. An organizational activity in which white space is meaningfully integrated with deta: 

of the blot —_ 

One important step to be taken in scoring a response as Z is that form must alway 
involved either in a primary or secondary way. This automatically reveals that pure a 
response (both chromatic and achromatic), pure texture response, pure vista response and a 
diffuse-shading response would never be scored as Z. In 1940 Hertz also introduced Siltin t. 
organizational activity in her system but the different types of organizational activity, unlike Beh 
were weighted equally. Organizational activities are interpreted as indicating intellectual 
activity. Hertz, however, interpreted organizational activities as the composite representative of 
intellect, creativeness, drive and efficiency. Avery low frequency of Zscores indicates depression 
and anxiety in persons who are of superior intelligence. 

For making a full and coherent interpretation of a Rorschach protocol, it is also essential that 
the five main variables introduced by Rorschachers be clearly understood. The five variables are: 
the Erlebnistypus (EB), the Experience Actual (EA), the Experience Potential (ep), the Experience 
Base (eb) and the Blend response. The last variable has already been explained and interpreted, 
Therefore, in the following paragraphs only the first four are discussed. 

The Erlebnistypus or Erlebnistyp (EB) is one of the important ratios suggested by Rorschach 
in 1921. It is also referred to as the Experience Type or the Experience Balance or 
movement—colour sum ratio or M:=C. The EB respresents a comparison of the sum of human 
movement determinant (M) to the weighted sum of each chromatic colour determinant (that is, C 
FC, CF and Cn). The sum weighted C is calculated by assigning the following values to each 
colour determinant, FC =0.5;CF = 10; Cor Cn=15.It may also be written as: 

FC +2CF +3C 
2 


Rorschach’s contention, which is still supported by many studies, denotes that. the 
movement—colour sum ratio or the EB indicates basically two types of constitutionally 
predisposed response tendencies. If the M of M:=C is greater, it indicates introversiveness of 
proneness to use one’s inner life for the satisfaction of basic needs. On the other hand, if the 
=C of M:=C is greater, it indicates extroversiveness or proneness to use social interactions 45! 
means for gratification of one’s own important needs. Such a person likes more to express hi 
affect towards the world than an introversive person. Some Rorschachers have prefixed a certain 
ratio and when M outweighs that ratio, introversiveness is indicated, and when ZC outweigh 
that ratio, extroversiveness is indicated. For example, Klopfer et al. (1954) have said “wher ‘ 
responses outweigh the £C by 2:1 there is an introversive balance; whereas when ZC outwelb 
by 2:1 there is an extroversive balance.” Rorschach described a rare possibility in which » 
sides of the ratio are equal, that is, M4 becomes equal to £C. He called such persons ambieqy 
Such persons share the experiences of introversiveness as well as extroversiveness, and a 
they are described as showing failure to establish a certain response tendency for the pratifica” 
of basic needs. 

The Experience Actual (EA) is the second important variable of the 
significance. The EA was for the first time introduced by Beck (1960). The EA is obtained 
data shown in the EB and is equal to the sum of all components appearing in EB. It may b 
as under: 


C= 


interpreta” 
ined from'™" 
o writle” 


FA=M+IC 


Projective Techniques 251 


geck has pointed out that EA indicates that the affective experiences of persons are organized 
or controlled. 

The Experience Base (eb) represents a ratio which is based upon a similar ratio suggested by 
Klopfer et al. (1 954). The eb, also known as the confirmatory ratio, is nothing but a comparison of 
|| nonhuman movement determinants with all categories of pure shading determinants and 
achromatic determinants. Thus eb* may be written as indicated below: 


eb ==(FM+m):(T+V+Y+C’') 


According to Klopfer, the eb indicates a kind of response tendency (either introversive or 
extroversive), which has not been fully accepted by the persons at a given time. The introversive 
tendency is reflected by (FM + m) and extroversive tendency is reflected by (7+V+Y+C'). 
Both of these response tendencies are such that they are not easily controlled by higher cognitive 
actions. It is obvious from the above facts that the interpretive significance of EB and that of eb are 
similar to some extent. Ordinarily, eb should follow the same direction as EB because it also 
indicates the above two types of response tendencies. If this happens, the implications of eb are 
said to have been strengthened and confirmed. 

The Experience Potential (ep) is another important variable of interpretative significance. 
This ratio is based upon derivation of data incorporated in eb and is equal to the sum of all 
nonhuman movement determinants plus all varieties of pure shading determinants and 
achromatic colour determinants. Thus ep may be written as under: 


ep =X(FM+m)+2(T+V4+¥4+C") 


Thus the ep (Experience Potential) is related to the eb (Experience Base) in the same fashion 
as the EA (Erlebnistypus) is related to the EB (Experience Balance). The ep tends to indicate such 
needs and affects which are nol properly organized or controlled. 

As students may often get confused when reading the discussion on the interpretation of the 
Rorschach responses, the author has provided a synthesized general outline to help them in 
interpreting scores (see box on next page), 

There is little agreement regarding the scientitic status of the Rorsc hach tests. The opponents 
claim that the test lacks a universally accepted standard of administration, scoring and 
interpretation; evaluation of data are subjective; results are unstable over time; it is unscientific 
and is inadequate by all traditional standards (Exner 1995, Nezworski & Wood 1995, Wood, 
Nezworski & Stejskal 1996) 

The reliabilities and validities of the test have also been debatable. Commenting upon the 
specitic level of validity of the test, Buros (1970) has very clearly said, “The vast amount of writing 
and research has produced astonishingly little, any, agreement among psychologists regarding 
the specific validities of the Rorschach.” Meta-analysis by Parker (1983) reported an overall 
internal reliability coefficient of .83 which is based upon 530 statistics from 39 papers published 
in between 1971 and 1980, However, meta-analysis by Parker was criticised on the grounds that 
— on validity were not analyzed separately trom results on reliability (Garb, Florio & Grove 

998), 


Lastly, taking the pros and cons, it can be safely concluded that a final word on the 
Rorschach test is vet to come. More researches are needed but unless the practitioners can agree 
On a standard method of administration and scoring, the hands of the researchers would 
remain tied 


et 
asl Klopier, F(T 4V +Y 4 C is substituted by ¥(FC +C+C in which FC and C represent texture responses and C' 
in ; 
Presents achoromatic colour responses. 








252 Tests, Measurements and Research Methods in Behavioural Sciences 


A general outline for Rorschach interpretation 





Total responses (R) = 
| Total time (7) = 
Average time per response (7/R)= 
Average reaction time for achromatic series = 
(Cards |, IV, V, Vi and VII) 
Average reaction time for chromatic series = 
| (Cards II, Il, VII, IX and X) 
| number of responses to cards VIII, IX and X 
number of responses to the remaining 7 cards 


3(Fr + rF)+ (2) =F 


| Affective ratio = 

y ntricity index = ———_———_- My =— 100 

'Egocentricity index = -F Yo == x | 
6 y R R 


EF 


Lambda Index = —————. = 
sum of non—pure F 


cto = — F100 = 
(SFT +r) 
| FC:CF+C = 


Are = AtA? .100 = 





H%=H 100 


| H+Hd | 
4+ ——_—— x 
R 
| Blends :R = 
W iM = 
| Wi: D = 
Number of P = 
| Number of O = 
Frequency of Z (or Z;)= 
| (H + Hd):(A+ Ad) = 
(H + A):(Hd + Ad) = | 





FCO +2CF+3C _ 
= oe 
Erlebnistypus, OF FB=M:=C = 
| Experience actual, or FA=M+iC = 
| Experience base, or eb ==(FM+m):2(T+V4+Y¥+C')= 


Experience potential, or ep = S(FM + m)+(T+V4¥4+C')= 





THE HOLTZMAN INKBLOT TEST _ 
Modelled after the Rorschach test, Holtzman et al. (1961) developed an inkblot test — 
known as the Holtzman Inkblot Test (HIT). Holtzman developed this test in order to remove SF 
of the basic technical difficulties of the Rorschach test like unlimited number of responses, P ; 

scorer reliablity, etc. The test Is available in parallel forms, each form (Form A and B) having "4 
cards. Both coloured and achromatic cards are included and a few inkblots are st ~ 
asymmetric. The subject is permitted to give one response per card and thus, the total os te 
responses for any examinee is a fixed one, thereby automatically avoiding many of the diffic 








Projective Techniques 253 


scoring of the Rorschach test. Each response is followed by a two-fold simple question: Where 
was the percept represented in the blot and what the percept suggest about the blot? All the 
responses are classified under 22 response variables, which include the Rorschach variables plus 
bles like hostility, anxiety, and pathological verbalization. The main advantages of the 
Holtzman test over the Rorschach test are objective and easy intercomparability of examinees 
because of the limited and fixed number of responses and availability of percentile norms for all 
the 22 variables for samples ranging from 5-year-olds to adults. Due to the above factors, many 
researchers have, in general, agreed that the Holtzman test appears to be better standardized than 
the Rorschach test. A group-test form of the Holtzman inkblot test to be administered to more 
than one individual at a time has also been recently developed (Holtzman et al. 1963). 


The major HIT scoring variables can be discussed as under: 


1. Reaction time: It includes time in seconds beginning from the presentation of the tnieblot 
to the beginning of the main response. 


varia 


2. Rejection: Itis scored when the examinee or testee fails to report anything or returns the 
inkblot to the examiner. 
3 Location: It is scored on a 3-point system: 0, 1 and 2. 0 is scored for whole blot, 1 for 
large area and 2 for smaller area. 
4. Space: It is scored when there is a true figure-ground reversal where white part is the 
figure. 
5. Form Appropriateness: It is scored on three points: 0, 1 and 2; 0 for poor, 1 tor fair and 2 
for good. It is a sort of goodness of fit of the concept to the form of the blot. 
6. Form Definiteness: It is scored on 5-point scale ranging from 0 to 4; 0 for formless 
concept and 4 for highly formed concept 
7. Shading: \tis scored 0 to 2. Shading such as texture, fuzziness are scored as determinant 
8. Colour. It is a primary determinant and scored 0 to 3. 
9. Movement: It is scored 0 to 4 and is reserved for those responses which implies energy 
or dynamic movement quality. 
10. Pathognomic Verbalization: It is scored for absurd, queer and incoherent verbalizations 
to cards. 
11. Integration: It is scored 0 to 1. When two or more blot elements are effectively 
integrated in the response, it is scored 1 and otherwise it ts scored 0. 
12. Content scores: It includes categories like Human, Animal, Anatomy, Sex, Abstract and 
is scored 0 to 2 based upon absence, partial or full presence of the concept. 
13. Anxiety; Each response is scored 0 to 2 for signs of anxiety. 
14. Hostility: Each response is scored 0 to 3 for showing the signs of hostility. 
15. Barrier. It refers to any protecting covering that might be symbolically related to body 
image boundaries. If barrier is present, it ts scored as 1 and if absent, it is scored as O. 
16. Penetration: Itis scored 1 if the concept is symbolic of the testee’s feeling that his or her 
body's external parts can easily be penetrated. In otherwise case, it is scored 0. 
17. Balance: It is scored 1 if the testee sees presence or absence of symmetry in the design. 
Otherwise, it is scored as 0. 
18. Popular: It refers to commonness of the response and scored 1 if the response ts 
common. Commonness is determined on the basis of 1 in 7 normative protocols. 
Oo and Scoring of HIT are well standardized. Its Scorer reliability appears to be 
tistactory. The various other reliabilities like split-half, alternate-form and test-retest 





se 


264 3 Jiests. Measurements and Research Methods tn Bebavioural Sciences 


reliability investigations although show some differences, but their results are encoy 
Validity data on HIT also have yielded satisfactory results. 

A recent variant of the HIT called the HIT 25, a short version, consisting of the first 25 car 
from Form A of HIT with two responses per card has been developed by Holtzman (1 988). Thi. 
shorter version, that is, HIT 25 has been found very useful for diagnosing the Cases of 
schizophrenia. A group form of the HIT, using slides, has also yielded scores on most Variables 
that are well comparable to those obtained through individual administration (H 
Moseley, Rienehr & Abbott 1963; Swartz & Holtzman 1963). 


Cassell’s Somatic Inkblot Series (SIS) 

Cassell’s SIS (1965, 1979, 1980, 1984) is another important latest development in the field of 
inkblot tests. He was not satisfied with the stimulus materials of earlier inkblot tests through 
which the sufferings of the patient were studied. These earlier tests were useful for understanding 
only the basic structure of personality. He appreciated the idea that emotional complaints of the 
patients could be easily understood through somatic processes. 


raging. 


oltzman, 


SIS, according to Cassell, is a sort of structured projective diagnostic test. It is Projective 
because it is based on spontaneous, individualized responses to semi-ambiguous inkblot figures, 
It is structured because of its sequential presentation of research and with typical or atypical 
response potential. It is diagnostic test in the sense that it makes interaction of structure and 
stimuli that evokes meaning and symbolism unique to individual and this helps in differentiating 
typical and atypical persons. 

There are three forms of SIS: 

(a) Somatic Inkblot Series | (SIS-1): \t consists of 20 inkblot images printed on cards like 

Rorschach test. 

(b) Somatic Inkblot Series II (SIS-I1): It consists of a 8-page booklet with 62 images printed on 
it. Each image is provided with a space below it for writing the response. 

(c) SIS-Video: This is really a video version of SIS-Il where video starts with pictures of 
flowers and soft music followed by images one by one. Each image stays on the screen 
for about 30 seconds. It is also designed for self-administration where the examinee 
writes answers on a sheet. 

Cassell’s explanations were further crystallized and reinforced by Piotrowski (1957), Lerner 

(1991) and Schafer (1954). 


THEMATIC APPERCEPTION TEST (TAT) 


The Thematic Apperception Test, also known as TAT, is another projective test commonly used in 
clinical and nonclinical settings. The TAT was first published by Murray in 1935, under the head 
‘A method for investigating fantasies: the Thematic Apperception Test’ in Archives of Neurology 
and Psychiatry. Later on, Murray & Morgan (1938) working at the Harvard Psychological Clinic 
published a book, Exploration in Personality, in which the details of analysis of TAT appeared. As 
compared to the Rorschach test, the TAT has less vague and ambiguous pictures to which the 
examinee responds. As the RT and the TAT provide complementary information, a combination 
of these two has proved to be very effective, though the latter has been found to be much more 
effective in the comprehensive study of personality and in the interpretation of neurosis, 
psychosis, behaviour disorders and psychosomatic diseases. According to Murray (1971, 1) the 
purpose of the TAT is to reveal “some of the dominant drives, emotions, sentiments, complexes, 
and conflicts of personality. Special value resides in its power to expose the underlying inhibited 


tendencies which the subject or the patient, is not willing to admit or cannot admit because he is 
UNCONSCious of them.” 


Projective Techniques 255 


In the TAT, two terms are worth mentioning, namely, thematic and apperception. The term 
thematic’ has been derived from the term ‘thema’ which refers to a subject or topic on which a 
srson thinks, speaks or writes. Murray has defined thema in a much broader sense which, of 


course, includes the above meaning. According to him, thema is defined as an interaction of the 


jeed and press variable. Murray has defined need as a hypothetical process within the organism, 
which stimulates him into either covert or overt action. Similarly, Murray (1971) has defined 
press as 2 force in the environment, which may tacilitate or interfere in the satisfaction of the need 
of the organism. The concept of thema is, thus, based upon this ‘need-press’ theory and simply 
represents a patterning of these two basic variables, that is, need and press. The second term of 
the test ‘apperception’, refers to a clear perception involving definite recognition or 
indentification. Thus, apperception is different from perception in the sense that the latter may 
sometimes be vague or indistinct. The inclusion of the term makes justification for the fact that 
thema 1s apperceived, that is, the examinees not only perceive but also recognize the 
implications of the stimulus situation of the card. In this sense. the TAT is different from the 
Rorschach test because in the latter, examinees only perceive the stimulus of the card without 
making any definite recognition of its implications. 

The standard TAT consists of 19 cards containing vague or ambiguous pictures in black and 
white plus one blank card. There are thirty-one cards (30 pictured cards plus one blank card) ina 
series and these cards are used in various combinations depending upon sex and age. The 
maximum number of cards to be administered to any one individual 1s 20. Murray recommends 
that TAT has no utility for children below tour years of age. Some cards are used with all subjects 
and some are used with one sex group or with a particular age group. Cards for a specific sex 
group or age proup are distinguished by letter symbols: F tor adult females. M for adult males, B 
for boys under 14, and G for girls under 14, BM tor both boys under 14.as well as adult males, GF 
for both girls under 14 as well as adult females, BG for both boys and girls under 14. Cards having 
none of these symbols such as 1, 2, 4, 5, 14, 15, etc. are meant for both sexes and all ages. All the 
20 cards are usually administered in two one-hour sessions, 10 ¢ ards in each session. The more 
unusual, dramatic and bizarre type of cards are usually selected tor the second session, that is, 
pictures selected for the second session are less like everyday lite situations. Murray (1971, 5) 
recommends an interval of at least 24 hours between the two sessions Cards to be used in the first 
session are 1, 2, 3BM. 3FG, 4, 5. 6BM. 6GE 7BM. 7GF 8BM, BGP YbBM YGF and 10 whereas 
the cards to be used in the second session are 11. 12M. 12F 128G. 1 3MF. 138, 13G, 14, 15, 16 
(blank card), 17BM. 17GF. 18BM.18GF 19 and 20. Ina clinical setting, most clinicians use only 
10 cards selected according to the purposes at hand Murray (1971) has provided for separate 
instruction to normal adults and children under 14. These instructions are approximately given 


4s follows: 


Instruction for Normal Adults in the First Session 
This ts a test of mapination | shall show you sane pictures, one al a time and your task will be to 
describe a story. Your story must include what has led up to the event shown inthe picture, whiat is 
happening at the moment, whal the characters are leeling and thinking: and the outcome, Express 
your thoughts as they come to your mind Do you understand? You can devote about five minutes 


lo each story, Mere rs the tiest picture 


Instruction for Children under 14. Or Poorly-educated Adults, Or Ps ychotics in the First 
Session | 

some pictures, which will be shown to you and for each picture | 
lude what has happened before and what is 
nd thinking and how it will come out. You 
d? Here is the first picture. You have five 


This ts a story-telling test. | have 
want you to tell a story. Your story should inc 
happening now, Also write what the people are feeling a 
can write any kind of story you want. Do you understan 
minutes for writing a story. See how well you can do it. 








a i : 


266 Tests, Measurements and Keseare bh Methods in Behavioural Sctences 


instruction for Normal Adults in the Second Session 
The procedure today is the same as before. Your first ten stories were excellent and cohereny bi 
you mostly kept yoursell ¢ onfined to the facts of everyday life. Now | would very much APPreciar, 
seeing what you can do when you disregard the realities of everyday life and let your IMaginatio, 
have its expression, as ina myth, fairy story or allegory. Here is picture No 1. 


Instruction for Children under 14, Or Poorly-educated Adults, Or Psychotics in the Secon 


Session 
Today | shall show you some more pictures. These pictures are much better and more interest, 
You told me some fine stories the other day. Now | want to see whether or not you can describe , 
few more. Describe them in an even more exciting way than you did last time if you can—like , 
dream or fairy tale. Here is the first picture. 
Card No. 16, which is a blank card, is preceded by a special instruction like this: 
See what you can see on this blank card. Imagine some picture there and describe the story. 
The responses of the examinee are recorded verbatim and comments by the examiner are 
allowed after the story for the first card is completed. The purpose of comments is to know 
whether or not an outcome has been mentioned, whether the examinee has taken a very long or 
short time in completion of the story and whether the story is very long or short. (The average 
length of a TAT story is 300 words. A story having less than 140 words indicates lack of 
co-operation and involvement by the examinee and hence is not worth scoring.) After this first 
story, Murray (1971, 4) recommends, “...as a rule it is better to say nothing for the rest of the 
hour.” However, the examiner is left free to intervene in the remaining stories if he thinks that 
intervention would improve the specific content of the story without actually influencing the 
content of the story itself. The intervention becomes absolutely necessary when the examinee is 
taking a longer time than necessary or a very short time in describing the story and/or if he leaves 
out a very important part such as the outcome of the story. If the examinee describes several 
stories rather than one story in a single picture, he should be encouraged to concentrate on only 
one long story rather than several stories. A too long story may be cut off by the examiner (Murray 
1971, 4). In this way all the ten cards are given successfully and thus, the first session is over. After 
a time gap of one day, or 24 hours, the second session starts, which is allowed to be completed in 
the same way. The cards of this session resemble less with everyday life situations and an 
information to this effect is given to the examinee. In this session, the blank card is given to the 
examinee and here the examiner gives a slightly modified version of the instruction. When all the 
20 cards have been given, Murray (1971) recommends a process of interview which may be held 
either immediately or after a few days. The purpose of the interview is to know what sources or 
associations the examinee has used for the stories—whether the source of the ideas of the stories 
is his private experience, or novels or movies or the experiences of relatives and friends. 


The scoring and interpretation of the TAT stories is the next important task to be 
accomplished by the examiner. The scoring is not separable from the interpretation in the TAT, 
and thus, it is different from the Rorschach test where it is. Not only this, in scoring the Rorschach 
responses, most Rorschachers more or less agree with regard to the symbols and the criteria used, 
whereas in the TAT no such agreement exists among the TAT users. As a matter of fact, no formal 
scoring (except Hero, Need, Press, Thema and Outcomes) of any fixed kind is used by the TAT 
users (Nunnally 1970). In such a situation, Murray’s own proposals regarding the interpretation 
of the TAT seem most appropriate. The TAT differs from the Rorschach test in one other basic 
sense, too. The purpose of the Rorschach test is to reveal the structure and organization 0! 
olemegeeee whereas the purpose of the TAT is to reveal the contents of the personality such as 
ssitenienpecte stagetacians s marsenint ee ete eee 
Gerivenee ot hie seems = PIBUONS seenes y way of creating stories, he organizes the various 

| periences, which reveal the different aspects of his own personality. 


Projective Techniques 257 


In interpreting the TAT stories, Murray makes the following two basic assumptions: 


|. The characteristics of the hero (also called the focal figure by some TAT users) of the story 
broadly represent the tendencies of the examinee’s personality. 


2. The situation which surrounds the hero of the story represents the various facets of the 
examinee’s life situation containing his past, present and future environment. 


These two assumptions are analyzed under the following six categories: 


The Hero 

the hero (either male or female) of the story is the central character about whom the entire events 
revolve. The interpreter recognizes the hero of the story by his principal traits like superiority, 
inferiority, leadership, belongingness, mental abnormality, criminality, quarrelsomeness, etc. It is 
the hero of the story with whom the examinee identifies himself and in whom he is most 
interested. The hero of the story plays the leading role in the story and is the main character who 
is associated from the beginning of the events to the outcome of the story. A story may have one 
hero or sometimes two, three or four heroes, which Murray has called a sequence of heroes. 
Sometimes, the hero characteristics are equally divided among equally significant characters. 
Such heroes are known as partial heroes. Sometimes, the story is such where the hero only listens 
to the events in which another character is involved. Such stories, therefore, contain a primary 
hero (who hears the events) and a secondary hero who narrates the events. 


Motives, Trends and Feelings of the Heroes 

Under this head, Murray explains that the interpreter should carefully analyze overt behaviour 
(both usual and unusual types) of the hero (that is, what the hero thinks, feels and does) because 
these overt behaviours reflect the needs (or drives) and emotions of the hero (and, therefore, of 
the examinee). A need may be expressed subjectively in the form of a wish, intention or impulse 
of the hero or it may be expressed objectively in the form of a trend of overt behaviour. Murray 
has given a list of 28 needs. Some of these are: n aggression divided into four parts—emotional 
and verbal, physical and social, physical and asocial, and destruction; n achievement, n 
passivity, n dominance, n abasement, n nurturance, m sex, n succorance, and n aggression, n 
acquisition, n affiliation, m autonomy, n deference, n cognizance, n blame avoidance, n harm 
avoidance, etc. The letter n is the abbreviation used for need. Murray further suggests that the 
manifestation of these needs should be rated on a 5-point scale ranging from 1, representing a 
mild form of the need, to 5, representing the intense form of the need. In recognizing the various 
needs, due attention is also given to the intensity, duration, frequencies of their occurrence in 
different stories and the general significance of the story. After all the 20 stories are scored on a 
5-point scale, a total score for each need is computed and compared with the standard score for 
the examinee of given age and sex. Those needs whose total scores are very high or very low are 
carefully analyzed in relation to each other. Some of the interpretive hints are like this: A male 
examinee showing unfriendly motives and actions to female characters in most of the TAT stories 
indicates that he has some troubled relations with women in his life. Likewise, if a hero of most of 
the stories displays disappointment, embarrassment and failure, it shows that the person feels 
defeated and depressed in his actual life. Special attention is also paid to the unique materials, 
Which Significantly differ from the common and popular responses to each picture because these 


ee and unusual materials may have special significance and relevance in the examinee’s 
ile, . 


Forces of the Hero’s Environment 

— this head of the TAT stories emphasis is placed upon environmental variables or forces 
: led press), which can either facilitate or interfere with the gratification of needs of the hero, 

“ese environmental variables include details of objects and persons noting their uniqueness, 











258 Tests, Measurements and Research Methods in Bebavioural Sciences 


intensity and frequency. Special attention is also paid here to include objects and persons whi 
are notin the picture but invented by the examinee. More than 30 such press (pital is pence: 
have been listed by Murray. A few examples are p rejection, p dominance (divided Wis than 
parts—coercion, restraint, inducement or reduction), p physical injury, p ta big . 
aggression (divided into four parts—emotional and verbal, physical and social shivsiéal 4 , 
asocial and destruction of property), p affiliation (divided into two parts—associative ints 


emotional), p nurturance, p physical danger (divided into two parts—active and in support), etc 
All these environmental forces (or 


(The letter p is based on the analogy n and refers to press.) 
(promised or realized) upon the hero. The strength of 


press) are classified according to their effect 
press in each story is rated on a 5-point scale of which 1 indicates the minimum strength and 5 


indicates the maximum strength. Like the criteria of need, here also the criteria are intensit 
duration, frequency and general significance in the story. After rating all the 20 stories on i 
5-point scale, the total score for each press !s computed and then, it is compared with the 
standard scores for the examinee of a given sex and age. The total scores, which are very hi gh or 
low as compared with standard scores, are carefully examined in relation to each other. | 


Outcomes 

Outcomes refer to how the story concludes. There may be a happy ending or an unhappy ending. 
In assessing outcomes, the comparative strength of the forces emanating from the hero and the 
strength of the forces emanating from the environment are compared and analyzed; the 
magnitude of frustration and hardships shown by the hero as well as his success or failure in 


handling those situations is also analyzed. 


Themas 

Themas refer to the interaction of need (s) of the hero with environmental forces, that Is, press, 
combined with the successful or unsuccessful outcome arrived at by the hero of the story. The 
purpose of a thema Is to study the different types of forces In their relationships and to determine 
the most important problem of the examinee from the interaction of needs and factors leading to 
the gratification of those needs. Obviously, then, a thema is nothing but a synthesis of factors 
analyzed into hero, need, press and outcome. Themas may be simple or complex. 


ng them are analyzed. The 
s as revealed by the choice 
own interests and 


Interests and Sentiments 
Under this heading, choice of topics of the story and ways of handli 
here is that the hero’s interests and sentiment 


fundamental assumption 
of the topics and the way in which they are handled are the examinee’s 


sentiments. 
Other TAT interpreters like Tomkins, Wytt, Rapa 
different techniques for analyzing the stories. Amon 
technique with some modifications is popular and frequent 
categories for analyzing the TAT stories are hero, need, press, OU 
In arriving at a conclusion on the basis of the analysis of TAT stories, 


should be kept in view: 

1. Normally, about six out 
should fall under the impersona 
elements given in the picture, elements 
witnessed in movies, etc., none of which 
personality of the examinee. If more than 30 
a the examinee is not involved in the tas 

st itself has not been skillfully administered. 
ig: (1971) distinguishes between (wo levels of functioning: 
e physical and overt verbal behaviour, and the second level w 


port, Bellak, Arnold, etc., have suggested 
g all the different techniques, Murray's 
ly used. The most widely used 
tcomes and themas. 

the following points 


y 30% of the stories 
4l elements like the 
ead in novels OF 
tendency of the 
it indicates that 
ant or the 


of the twenty stories, that is, approximatel 
| category. This category consists of imperson 
‘nvented at the moment, fragments f 
represents the important determining 
°/, of the stories fall in this category, 
k or the content is psychologically irrelev 


the first level which 
hich includes plans, 


Projective Techniques 259 


deas, dreams, fantasies, etc. TAT has been designed to reveal the second level and not the first 
ievel of functioning, although on the basis of the second level a guess regarding the first level can 
be made. 

3. Murray (1971) has distinguished between three layers of normal personality—outer layer, 
middle layer and inner layer. The inner layer consists of repressed tendencies or wishes, which 
are never (or rarely) expressed in thought (that is, the second level), but expressed in action (the 
first level). The middle layer consists of desires and tendencies, which can be expressed in 
thought in undisguised form (the second level) and can sometimes, also be expressed in action 
the first level). The outer layer consists of those wishes, desires and tendencies, which are 
frequently expressed publicly (the second level) and openly expressed in behaviour (the first 
level). The skillful interpreter must find out to which of these layers the different important 
variables such as need, press, etc., belong. Generally, the stories described in response to the first 
ten cards (given in the first session) tend to reveal the outer layer of the personality and the stories 
described in the second session regarding the remaining ten pictures tend to express the inner 
and middle layer of personality. 3 

The above points should be carefully dealt with in arriving at a final conclusion. Murray 
warns that any conclusion arrived at on the basis of TAT stories should be regarded only as 
working hypothesis (to be verified by other techniques) rather than proved facts. As Murray 
(1971, 15) has said: 

“The TAT draws forth no more than twenty small samples of the subject’s thoughts. To 
suppose that these will invariably provide a skeleton of the total personality is unduly 
optimistic...there are sets of TAT stories...from which it is impossible to infer the underlying 
determinants of character.” 

In India, TAT has been adapted by Dr Uma Chaudhary (1960). This adapted version of TAT 
has 14 cards which have environmental details and human figures in context of Indian set up. 
Out of these 14 cards, card No. Ill and Card No. IV have got gender reservation. For example, 
Card No. lll M.B. is reserved for male testee whereas Card No. III FG is reserved for female testee. 
Likewise, Card No. IV has also MB and FG series. Other Cards are common for both genders. 


Derivatives of TAT 
Like the Rorschach test, there are some derivatives of the TAT. Modelled after the TAT, a few 
pictorial tests have been developed. The Children’s Apperception Test (CAT) is one such test. The 
CAT, developed by Bellak (1954), is meant for children between three and ten years. The test 
Consists of ten cards, which are slightly smaller than the standard TAT cards. The CAT pictures 
Substitute animals for human beings, the assumption being that animals tend to produce fantasies 
Mote readily than human beings among children and therefore, children can easily project to 
animal pictures as compared to pictures of human beings. These animal pictures are, however, 
Presented in anthropomorphic fashion. All the CAT pictures intend to arouse fantasies relating to 
mk rivalry, toilet-training, aggression, etc. Recently, Bellak & Hurvich (1966) have prepared a 
chibiee Modification of the CAT, which is meant tor use with children beyond 10 years. Older 
ike es dre more capable of projecting to human figures and as such, the human modification of 
Seal (known as CAT-H) was prepared. The Rosenzweig Picture-Frustration (P-F) Test iS 
Sesree modelled after the TAT. The P-F test, developed by Rosenzweig in 1942, is slightly 
where th bia the TAT and the CAT. The latter are picture-story tests whereas the P-F test is a test 
Piette ee is required to write a short bit of conversation in the caption box of each 
“yeiraia st has two forms—adult’s form (for 14 years and older) and children S form (for 4 to 
ere: children). In both the forms each picture consists of cartoon-like drawings having 
"cipal characters. One of the principal characters is engaged in a frustrating situation 


Comm a ‘ 
MA in lifevand the other charactér is saying something, which calls attention to that 


a7 





260 = fests Medremertts anid Mescart by Methods tn Behavioural Sc1ences 
d to write in the caption box of each picture 


ic also instructed to write the very first reply that 
tering the P-F study, I is assumed that the examinee identife, 
cts his own frustrated tendencies in the form of the 
the frustration-aggression hypothesis, which 
stipulates that frustration always leads to aggression but since the individual has the Capacity fog 
frustration tolerance, the aperession produced by frustration may not be expressed outwards op 
be derived. The frustrating situations ‘included in the P-F study are of two 
-hlocking and superego-blocking. An ego-blocking situation is one in which some 
obstruction, mostly personal in nature, disappoints one of the two principal characters directly 
and a superego-blocking situation is one where one of the two principal characters is directly 
insulted or ‘ncriminated by the other characters. 

The P-F study aims at studying the types of aggr 


According to Rosenzweig, there are three types of aggres 
aggression. The three types of direction are: extrapunitive or directed outwards or towards the 


environment (also known as extra-aggression); intropunitive OF directed inwards or towards the 
self (also known as intra-aggression); and the impunitive or evading the frustrating situation or 
denying its presence. The three types of aggression are: obstacle-dominance in which the 
emphasis is upon basic events which acted as an obstacle in producing the frustration; 


ego-defence in which the emphasis is upon the protection of experience of the frustrated 
‘ndividual; and need-persistence in which the solution or remedy of the 


emphasis is upon 
problem which produced the frustration. In scoring the responses of the P-F study, the 
percentages of the responses falling into ea es of direction and types of 


ch of these three categor! 
aggression are computed, and then compared with some normative percentages. 
Other TAT derivatives ‘nclude the Object Relation 


Technique, the Picktord Projective 
Pictures (PPP), the Blacky pictures, etc. A detailed discussion of these varieties is beyond the 
scope of this book. 


VERBAL TECHNIQUES 
Verbal techniques are those techniques In W 


and where the examinees are also requried 
stimulus materials. Verbal techniques are different from pictorial tec 


the former, both the stimulus and the response are verbal, whereas in t 
verbal and the stimulus ts pictorial. The word-association test and the sentence 


are the best examples of verbal techniques. 


frustrating situation, The examinee 15 requeste 
the reply would be of the frustrated character. He 
es to his mind. In adminis 
frustrated character and proje 
F test is based upon 


com 
himself with the 
reply of that character. The P- 


may even 
types—€gZO 


ession and the direction of aggression, 
sion and three types of direction of the 


aterials are verbal (and not pictorial), 
ses verbally towards those 
hniques in the sense that in 
he latter the response !s 
-completion test 


hich stimulus m 
to give their respon 


‘ces for studying personality. 
mind after listening © 
and the reaction time 


Word-Association Test 

The word-association test 
This test requires the examine 
the stimulus word by the exa miner. The examin 
taken to respond towards each stimulus word. 

on test was first described by Galton in 1879. l 
Wunadt and Cattel introduced this technique to study thinking 
e interested in the technique and started using lt 
anxieties. Kraepelin used this technique in the 
ever, Jung in L910 made a systematic effort 
list of TOO words (some 
aminees from different 
sis of time taken in 


‘s one of the most popular projective dev 
e to tell the very first word that comes fo his 
er notes down the responses 


ater on, the early 


The word-associat 
experimental psychologists like 
nical psychologists also becam 
motions, conflicts and 
mentally ill persons, How 
alysis. He presented a standard | 
he responses of the ex 
lysis included the analy 


processes. The cli 
as a means of exploring € 
exploration of personality of 
to introduce the technique in his psychoan 
neutral and some emotionally toned) and analyzed t 
angles to peta variety of diagnostic clues. | fis MAIN ana 


Projective Techniques 261 


responding (reaction-time) and the content of the responses. A longer time was taken to be 
indicative of some contlicts, anxieties, repressed experiences and a sign of emotional 
embarrassment. Delays indicated that those words touched off painful experiences and 
threatened to bring to light the anxiety-provoking or guilt-laden materials or were close to the 
repressed experiences. The content of the responses is analyzed from the angle of contrast 
(black-white), being unusual (chair-sea), supraordinate (cow-animal), clang association 
(land-hand), etc. A large number of unusual responses such as ‘Pen-Bear’, etc., are taken to be 
‘ndicative of mental illness. Jung had also studied the retest behaviour in the word-association 
test. He obtained the responses towards his preselected words (which included both neutral and 
emotional words) and then, the examinee was again administered the test with the instruction to 
recall the responses given previously. Changes in responses were taken as important clues for 
further exploration of emotional complexes of the individual. Later on, Kent and Rosanoff also 
developed their own test for use as a screening instrument in psychiatric clinics. The test is known 
4s the Kent-Rosanoff Free Association Test, which consists of 100 words belonging to the neutral 
and common category. On the basis of the responses to each word of the examinees, the 
frequency tables of the responses of each word are prepared. An ‘Index of commonality’ (or the 
median frequency value) is calculated on the basis of the frequency tables. Kent and Rosanoff 
found that mentally ill persons exhibit a lower index of commonality, that is, they tend to give 
uncommon or individual responses (or less common responses) more than the normal ones. 


Still another important word-association test was developed by Rapaport and his associates 
in 1946. The test consists of 60 words and aimed at exploring the areas of aggression and 
sexuality with special emphasis upon oral, anal and phallic levels of psychosexual development. 


Sentence-Completion Test 
The sentence-completion test consists of a series of partial or incomplete sentences and the 
examinees are required to complete them. The examinees are free to take a decision in 
completion of the meaning, and no time limitation is imposed for their decision. Originally, 
sentence completion was used by Ebbinghaus as a nonprojective technique of assessing 
intellectual level. A serious attempt to use this test to assess personality traits began in the 1930s, 
and one of the earliest tests of this type was developed by Rohde & Hidreth in 1940, 
Subsequently, several sentence-completion tests were developed. A few examples of 
sentence-completion items are as follows: 

l feel tense ...... 

l feel that sex ........ 

My sex life ....... 

l feel guilty about ...... :' 

Sex relations ........ 

The responses on the sentence-completion tests are analyzed in a more or less similar way to 
those OF TAT stories, that is, in analyzing these responses, the motives, expectations, moods, etc., 
af Interpreted in order to arrive at a meaningful conculsion. A few popular sentence-completion 
‘ests are the Rotter Incomplete Sentences Blank, OSS test, ete. 


APRESSIVE TECHNIQUES 

—— techniques are those where the examinee is given an opportunity to express his 

Ube went in the form of certain drawings, finger-paintings, play, role play, handwriting, and free 
QO 


Cis YS, etc. In such techniques the ways in which the individual manipulates or works with 
“" Materials are important (and not the end product of these manipulative acts) because it 


r= 











262. Tests, Measurements and Research Methods in Behavioural Sciences 


is assumed that through these manipulations he expresses his needs, motives, emo; 
conflicts, etc. Some examples of expressive techniques are the Machover Draw-a-Person me 
test, Buck’s House-Tree-Person (H-T-P) test and the World test, etc. Some of the leon 


expressive techniques may be studied under the following headings. 


Figure-Drawing Tests 
In figure-drawing tests, the examinee is given a sheet of paper and pencil to draw the figures o¢ 


person and in such drawings he is assured by the examiner that this is no longer a test of }, 

drawing ability and thus, is encouraged in his efforts. The two most common figure-drawin t us 
are the Machover Draw-a-Person (D-A-P) test and Buck House-Tree-Person (H-T-P) test A i. 
D-A-P test, the examinee is instructed to draw a person. If the first drawing is male, he is alan ; 
draw a female and vice versa. The examiner systematically notes down the sequence in wh; : 
the different parts of the body are shown. He also notes down the other eet asiet 


drawings are sometimes followed by inquiry to elicit specitic information 


procedural details. The 
_T-P test the examinee is required to draw the figure of g 


regarding the details of drawings. In an H 
house, a tree and a person. According to Buck, the figure of the house is taken to be the indicative 


of the associations concerning the examinee’s home and those living therein; the figure of the 
tree, of the associations relating to his life role and his ability to derive gratification from the 
general environment; and the figure of a person, of the interpersonal relationship of the 
examinee. | 

All such figure-drawings (emerging from the D-A-P test and H-T-P test) are interpreted from 
three angles: (i) analysis of the general overall impression of the drawings; (ii) analysis of the 
structural features of the drawings; and (iii) content analysis of the drawings. The analysis of the 


general overall impression of the figure concerns itself with the posture of the figure (giving the 
impression of action or being static), the facial expression, and other similar teatures. The overall 
general impression may reveal the impression of expansiveness, hostility, aggressiveness, and 
submissiveness. The structural analysis of the figure Is concerned with factors like the size of the 
figures, the presssure of the lines, the sequence of the parts drawn, their position on the given 
page and the like. The size of the figure is said to be directly related to the self-esteem of the 
examinee. The examinee who thinks about himself in a degrading fashion (poor self-esteem) is 
likely to draw a very small figure whereas the examinee who thinks about himself with an air of 
superiority, often draws a big figure, sometimes so big that it requires two sheets of paper fora 
single figure. The content analysis of the figures is concerned with emphasis upon the different 
parts of the body given in drawings and depiction of clothing and accessories. It has been 
demonstrated that emphasis upon drawings of head at the expense of other parts of the body 
everal important clues to personality. A disproportionately bigger head Is indicative of 
xaminees, it may suggest an overvaluation of intellectual work. 
al organs and breasts are likely to 
ars indicate paranoid tendencies 
tendencies, and emphasis 
guilt-laden 


reveals s 
organic brain disorder. In some e 
Drawings of nude human figures with emphasis upon genit 
mphasis upon eyes and e 

tends to indicate homosexual 
ations, hostility, aggression and also the 
d this omission is taken to be 
of guilt. Such 
elinquents. The 


indicate psychosexual conflicts; e 
emphasis upon lips in male drawings 
upon hands tends to indicate sexual preoccup 
tendencies. Sometimes, the hands are omitted from the figure an 
indicative of self-punishment as a consequence of an unconscious feeling 
omissions are mostly found in the drawings of sex offenders and juvenile d 
clothing and accessories have also some meaning in drawings. The emphasis upon pockets on 
the shirt of male figures drawn by a male examinee is taken to be indicative of homosexuality 
Holding of a knife by the figure drawn indicates aggression and hostility. Emphasis upon the bell 
in the drawing of a person (male or female) tends to indicate strong tendencies to control sexué 
impulses. Incapability to draw a figure of the opposite sex is taken to be indicative of latent 


homosexuality on the part of the examinee. 


Projective Techniques 263 


Toy Tests 

in toy tests, toys like dolls (representing adults and children of both sexes), puppets and 
miniatures are given lo the examinees, mostly children, who are allowed to play with the given 
objects in the way they desire. The examiner carefully notes down the items chosen by a child for 
playing, the manner in which he handles them, his emotional verbalization and other overt 
pehaviour (Bell 1948). A child’s play with these objects is expected to display his sibling rivalries, 
<ontlicts, fears, aggression, etc. One popular example of a toy test is the World test developed by 
Lowenfeld in 1939 and later revised and restandardized by Buhler, Lumry and Carroll in 1951, 
and Bolgar and Fischer in 1947. The test consists of 150 to 300 miniature pieces of objects like 
trees, cars, fences, houses, bridges, animals, people, etc. The examinee is requested to construct 
whatever he likes from these materials normally placed on a large table. The examinee notes 
down the objects chosen, the sequence in which the objects are made, expressive reactions like 
verbalizations, emotional expression and other overt behaviours. All these iterns of information 
are considered important for providing interpretative clues. 


Artistic Productions 

Artistic productions like drawings, finger-paintings, brush paintings, clay modelling and 
sculpture are some of the common projective devices through which unconscious dynamics of | 
the examinee may be judged. It is a common observation that mentally ill persons display | 
distortions, lack of symmetry, disproportions, stereotype and similar features in their paintings or 
drawings. Depicting ambitious maps, plans, projects and complex designs have been found to 
reveal paranoid tendencies. Emphasis upon sexual themes in painting or drawing reveals 
disturbances in psychosexual development. The colours chosen for painting also provide 
important clues to the personality. 


Graphology 

Graphology (or handwriting) is also considered as an important form of the expressive projective 
technique, which provides a clue to the personality. One of the first attempts to study personality 
on the basis of handwriting was made in 1622 by Camillo Baldo, an Italian physician. However, 
the scientific analysis of handwriting as a means of studying personality in modern times started 
in 1875 when J H Michon, a French scientist, published his book System of Graphology. This 
attempt to study graphology became recognized in 1896 when L Klages established the 
Graphological Society in Germany. Klages made a systematic analysis of handwriting of hysteria 
patients. Robert Sandek studied the handwriting of different persons very extensively and his 
attempt reached a point of culmination with the publication of his book entitled The Psychology 
of Handwriting in 1925. Since the publication of this book, a vast interest has grown among 
scientists for studying handwriting as a means of evaluating personality. 

_ The study of handwriting characteristics of children as well as adults has revealed several 
interesting points, Children having strong self-confidence and emotional stability tend to write 
longer letters with strong pressure whereas bewildered and preoccupied children usually write 
shor letters with low pressure and hesitating expression at the beginning of letters. In severely 
(isturbed children several alternatives of the forms of the letters such as the exaggeration of the 
Ze of letters, splitting of the letters and words, etc. were found to be most obvious. Likewise, 
— of adults’ handwriting have revealed that persons having brain disorders show erratic and 
ne ae flow in size of the letters because of the impairment of motor control essential Tor 
of letter writing. The writing of such persons is careless, showing frequent blotting and omission 
the ie ' very slow writing with a downward tendency in words or lines as well as shrinking of 

' letters of the lines displays the depressive mood of the examinee. 





————————— 


) =a 








p ' : = = aie aula . , 
fests, Measurements and Research Methods in Beharioural Sciences 


264 


EVALUATION OF PROJECTIVE TECHNIQUES 

Readers, by now, should have a clear picture regarding the different projective techn; 

of these techniques such as the Rorschach test and the Thematic Apperce ae Ton a ; 
popular, whereas others are little known—partly because they have PS aang, We 
because they do not permit objective measurement. To evaluate separately each of the one’ aly 


techniques discussed so far is very difficult, in the sense that it would require a separate ch : 
| Napter | 
‘in 


itself. As such, an overall evaluation of projective techniques, in general, is presented be| 
is , cs ow. 


Fakability 

In projective techniques, answers given by the examinees ae 

seli-image. As sith, they never one AT tees aby <i reat tthe 
 ccammmriitaneat! eta | : ers. In self-re 
inventories, situations are more structured and direct. Before making his answers kno pes 
others, the examinee thinks a lot and ultimately moulds the true responses most of the Sime 7 
light of what is considered socially desirable or undersirable. Thus, one of the im val 
advantages of projective techniques over the self-report measures of personality is that heya 
not susceptible to the fakability of responses. Despite this, there is evidence to support tea 
that examinees can fake their reponses even on standard projective techniques like the RT and 
the TAT when they are specially instructed to modify or change their responses (Masling 1960). 


Objectivity 
It is said that the projective techniques are less objective than the self-report inventories. In 
reality, neither the projective tests nor the self-report inventories are objective. Both possess an 
element of subjectivity but in different ways. Projective tests are said to be subjective in scoring 
because scoring a response is largely dependent upon intuition and the experiences of the 
examiner, This automatically means that the results of any two or more examiners cannot be 
totally identical. Above all, the interpretation of scores in projective techniques is even more 
coloured with the element of the subjectivity so that quite often the interpretation reveals more 
about the personality of the examiner than the examinee. As a matter of fact, the interpretation for 
the examiner becomes as projective as the test stimuli for the examinees (Masling 1960). The 
the other hand, no doubt possess the trait of objectivity in scoring, but 
dependent upon the subjective 


ese tests is largely 
nvalidate 


self-report inventories, on 
t give the true answers which may | 


the validity of the scores obtained on th 
processess of the examinees. The examinees may no 
the results, irrespective of the fact that the scoring was objective. 


Standardization _ all 
Most projective techniques are unstandardized, that is, they lack uniformity in ee 
interpretation. The examiners have a free play in scoring and interpreting the epee a 
on standard projective techniques like the RT, the TAT and the CAT. Not only that, has 2 ae 
ences the types and number of responses , pe 
and I 


found that the sex of an examiner influ : | | fesp 
(Masling 1960). A male examiner tres to get a particular ie 
rt inventories, on the other hand, 


female examinee. Self-repo 
are the least. 


examinee gives on the RT 
ces of free play by the examinees 


number of responses from a 
mostly standardized and therefore, chan 
Reliability a — 
The reliability of most projective techniques Is usually very low. There are seams ga eee | 
for this. First, there is no uniform standa rd way of scoring and interpreting the a eke ‘eats GO 
on projective techniques. As a consequence, scorer reliabil ity is very poor. Se - “ ‘ ee alculate 
not yield consistent scores upon retesting of the same subjects. It ie also'not pene aes because 
the alpha coefficient (internal consistency Of coetficienD of most projective tec "h techniques is 
the items are not usually comparable. The only logical estimate ol reliability of such tec 


Projective Techniques 265 


through the paralle|torm method. Moreover, most projective techniques (except the Holtzman 
inkblot test) do not have alternate forms. On the Whole, the computation of reliability of 
projective techniques is very difficult and whatever reliability coefficient has been obtained is, in 
general low. 


Validity 

Like reliability, the validity of most projective techniques is unsatisfactory. Most of the published 
validation studies on the projective techniques report criterion-related validity and most of such 
validity coefficients are inconclusive and debatable. In reality most of the traits measured by the 


common and standard projective techniques such as the RT, the TAT, the CAT and the 
Rosenzweig P-F study are such that they require construct validity which has not been obtained. 


situational Variables 


Situational variables like physical appearance of the examiner, emphasis upon certain types of 
responses by the examiner and changed instructions are likely to influence the responses on the 
projective techniques. some examiners may have a very formidable physical appearance, which 
is likely to affect the examinees’ capacity to imagine, think and resort to defensiveness. The 
overall impact of these factors is that they affect the response productivity on projective 
techniques. Examiners who do not possess a frightening appearance and who tend to encourage 
the examinee are likely to influence the responses of the examinees favourably, Not only this, it 
has been also shown that changed instructions influence the score on projective techniques to a 
great extent. For example, in one study it was demonstrated that when the Holtzamn Inkblot test 
was presented as an intelligence test (thus with a changed instruction), the responses of the 


examinees were found to be more changed than when they were obtained under the usual 
instruction of the test (Herron 1964). 


General Applicability 


Of the three general types of projective tests, the pictorial and expressive techniques have wider 
applicability than verbal techniques because they can be used with illiterates, small children as 
well as with examinees having speech defects. In general, the self-report inventories which are 
suited only to literates, have poor applicability, particularly when compared to the expressive 
and pictorial techniques. 


Eysenck (1959) has summarized his viewpoint on the critical evaluation of projective 
techniques as mentioned below: 


1. No consistent, meaningful and testable theories underlie the projective techniques. 


2. No marked correlation exists between the indicators of a projective test and the 


intellectual abilities as meausred or rated independently. 

. There is no evidence which shows a relationship between global interpretation of 
projective techniques by experts and psychiatric diagnosis. 

. There are no empirical evidences for most of the postulated relationships between 
projective-test indicators and personality traits. 

. There is no evidence that the conflicts, motives, needs and fantasies diagnosed by the 
projective techniques will yield congruent results when interpreted by psychiatrists 
separately, 

- Ample evidences are available to show that the majority of studies of projective 
techniques have flaws in their methodology, are ill-designed and are full of statistical 
€'rors in analysis of data. 

Projective techniques, in general, have poor ability to predict failure or success in 
different tields of life. 





Tests, Measurements and Rese 


arch Methods tn Behavioural Scterices 


266 
Mo Revi 
e —e 


L.. 


6. 


What is meant by projective technique? Discuss the major classification Schemes 

projective techniques. 5 9 
Discuss projective technique as a tool of psychological research. Assess its effectivenes. 
Assuming some data, explain the importance of different types of scores : 


Rorschach protocol, 
Evaluate the Rorschach test and the Thematic Apperception Test as _ tgoj: if 


psychological researches. 
Indicate the merits and demerits of the projective method of obtaining data. 


Write short notes on: 
(a) Verbal techniques 
(b) Figure-drawing test 


(c) TAT 
(d) Projective method of collecting research data 


te 


TECHNIQUES OF OBSERVATION 
AND DATA COLLECTION 


CHAPTER PREVIEW 
Methods of Data Collection 
Questionnaire and Schedule 
Wording of Questions in a Questionnaire 
Characteristics of a Good Questionnaire 
Order of Questions 
Functions of a Questionnaire 
Types of Questionnaires 
Advantages and Disadvantages of Fixed-response Questionnaires 
Advantages and Disadvantages of Open-end Questionnaires 
Advantages and Disadvantages of Mailed Questionnaires 
Advantages and Disadvantages of Face-to-Face Administered Questionnaires 


e Interview 
Types of Interviews 
Major Functions of an Interview 
Factors Affecting the Uses of Interviews 
Advantages and Disadvantages of Interviews 
Important Sources of Errors in Interviews 
Selection and Training of Interviewers 


e Content Analysis 
Purpose of Content Analysis 
Methods of Content Analysis 


Evaluation and Limitation of Content Analysis 


e Observation As a Tool of Data Collection 
Purpose of Observation 
Important Types of Observation 


¢ Difference between Participant Observation and Nonparticipant Observation 


* Rating Scale 

* Types of Rating Scales 
Numerical Rating Scale 
Graphic Rating Scale 
Forced-choice Rating Scale 

* Other Special Types of Rating Scales 
Q-sort Technique 
Semantic Differential Scale 
Behaviourally Anchored Rang Scale 
Nominating Technique—Sociomety 


267 


nate 
i= 





268 Tests, Measurements and Research Methods in Behavioural Sciences 


Problems in Obtaining Effective Ratings 

Factors Affecting Rater’s Willingness 

actors Affecting Rater’s Ability 
e Methods of Improving Effectiveness of Rating Scales 
Refinement in Stimulus Variables of Rating Scales 
Refinement in Response Variables of Rating Scales 
Improvement in Rating Procedures 


e Errors in Ratings 

Halo Effect 

Error of Severity 

Error of Leniency 

Error of Central Tendency 

Contrast Error 

Proximity Error 

Logical Error 
e Evaluation of Rating Scales 
e Meaning and Features of Secondary Data 
This chapter discusses those research tools or data-gathering devices which are very common in 
behavioural researches. The obtained data are of two types: primary data and secondary data. 
First we shall deal with different aspects of primary data and then we shall describe secondary 
data and its related aspects. By primary data, we mean the data that have been collected 
originally for the first time. There are different types of research tools and each of them utilizes 
distinct ways of describing and quantifying the obtained primary data. Some of the common 
research tools are: Questionnaire, opinionnaire or attitude scale, interview and _ interview 
schedule, observation method, content analysis, projective test, rating scale, sociometric test 
Q-sort and semantic differential. Some of these data-gathering devices will be discussed in 


separate sections. 


METHODS OF DATA COLLECTION 


Based upon the broad approaches to information gathering, data are categorized as: 
(i) Primary data 
(ii) Secondary data 


As discussed above, primary data are those data which have been collected from primaty 
sources on first-hand basis by the researcher. For example, determining job satisfaction of the 
employees of an organisation, assessing the attitude of the students towards teachers an 
ascertaining the quality of services provided by the workers are the examples of primary data. 
Secondary data are collected from secondary sources such as using census data for obtaining 
information regarding age-sex structure of the population. Some other examples of secondaly 
data are use of hospital records for finding out occurrence of dengue fever among different ag© 
groups, and collecting data from books, journals, newspaper, periodicals for obtaining historicé 
and some other types of information. Thus, secondary data are collected in a second-hand way 


_— ae eee sources. Figure 12.1 provides the details of the different methods of data 
ection. 


Techniques of Observation and Data Collection 269 


|_ Methods of data collection _ 


— hem 



















ae nein — ae 
| Primary sources | [Secondary : sources | 
— — a —— 
Interview | Observation | | Questionnaire | = inact 
4 a 4 7 
a | * Census data 
ee ea el === * Hosoital records 
| Unstructured Participan OSpt 
Structured _| |_ Unstructured | articipant | “ School records 
1 + Earlier research 
| Non-participant | * Govt. Publications | 


+ Client histories 


' 
[Faas 


Figure 12.1: Different methods of data collection 


QUESTIONNAIRE AND SCHEDULE 


A questionnaire Is used where factual information from the respondents is desired. It consists of a 
form containing a series of questions where the respondents themselves fill in the answers. 
A questionnaire must be distinguished from a schedule, an opinionnaire and an interview guide. 
4 schedule consists of a form containing a series of questions, which are asked and filled in by the 
investigator in a face-to-face situation. An opinionnaire is an information form which attempts to 
measure the attitude or belief of an individual. Hence, an opinionnaire is also called an attitude 
scale. When factua! information is desired, a questionnaire is used, but when opinions rather 
than facts are desired, an opinionnaire or attitude scale is used. An interview guide consists of a 
list of basic points or topics to be covered by the interviewer during the interview. A 
questionnaire is usally administered personally in groups of individuals. It has some advantages. 
When several persons are available at the same time and place, a questionnaire proves to be a 
very economical tool of data collection. Not only this, a questionnaire also enables researchers to 
get first-hand information regarding the vagueness of items, if any, as well as gives them an 
opportunity to establish a warm relationship with the persons being tested. 


Wording of Questions in a Questionnaire 

A vast literature on the wording of questions in a questionnaire is available. A careful review of 
the literature reveals that the following factors are of immense importance for any behavioural 
research utilizing the questionnaire as a tool of data collection. 


|. Simplicity in language: The aim of the investigator in wording a questionnaire is to 
communicate effectively with the respondents in his own language. In choosing the language tor 
Ine questions, he must keep in view the population for which it is meant. lf the sample is taken 
TOM a general population, technical terms and jargon should be avoided. If. on the other hand, 
Ne sample is taken from a professional population, the terms appropriate to the population may 
Used. While one is wording the language of the questions meant for the general population, 
© words chosen should be informal, simple and convey the exact and same meaning to all the 
ee For example, it is more appropriate to ask: “Do you think that ...... 2” than “Are you 

MEW that cscc8” OF “DO you intend to think that ......¢° 
eaquin tBAtOrs should, in general, avoid complex and long questions because such questions 
May Oe efforts on the part of the respondents In answering them and many respondents 
nd lor © ready for them. Such questions are also taxing In terms of time. Although complex 
tis bee ena should be avoided, this does not mean that the short questions are necessarily 
* Marquis (1969) conducted one experiment in which the effectiveness of a questionnaire 


> 





A 








270 = Tests, Measurements and Research Methods in Behavioural Sciences 


(health questionnaire) was examined as a function of the length of the questions. The sam 
questions were prepared in short forms and in long forms. The findings indicated that the atisindll 
to the longer questions were in close agreement with reports of the respondents’ physicians, he 
suggesting that longer questions had a higher validity than the shorter ones. : 

2. Ambiguity: The investigator must take pains to avoid ambiguous questions because such 
questions do not convey the same meaning to all the respondents, and therefore, differen; 
respondents may give different answers to the same question. For example, suppose the question 
is: “Do you feel shy when you are in a group?” A person may feel shy when he is in a heterosexual 
group but may not feel shy in a homosexual group. He may, therefore, answer “Yes” or “No” 
depending upon his attitude. When he answers “No”, what is really meant, is difficult to tell 
Likewise, double-barrelled questions may introduce ambiguity and hence, should be avoided, 
Suppose the question is: “Do you enjoy travelling in a car and bus?” A respondent who likes 
travelling in a car and dislikes travelling in a bus would be in a dilemma and, therefore, is likely to 
be guided by his discretion. In reality, for such double-barrelled questions, two independent 
questions should be given so that the respondent Is never placed in a state of dilemma and can 
answer according to his likes and dislikes. Sometimes not any particular wording can be 
accepted by all. In such a situation, the split-ballot technique is useful where different wordings 
of the same questions are used for different equivalent groups of respondents and subsequently, 
their responses are compared for knowing the effect of the wordings. 

3. Vague words: Vague words should also be avoided because they encourage vague 
answers. Words like ‘often’, ‘generally’, ‘fairly’, ‘on the whole’, etc., should be avoided unless the 
investigator is interested in vague answers. Vagueness is also introduced by the use of ‘why 
questions’. Lazarsfeld (1935) has discussed in detail how such questions introduce vagueness. 
For example, suppose the question is: “\Nhy do you not want to educate your wards in a Co- 
educational institution?” The answer may depend upon a host of factors, which may apply to 


some respondents and may not to some others. Therefore, nothing definite can be concluded 


about the respondents. 

4. Embarrassing questions: When respondents are asked questions about private matters or 
they do not want to make public, they are embarrassed. For example 
| behaviour, stealing, cheating in examinations, etc., May be 
and hence, they will either refuse to answer or distort their true 
assing nature of 
ask such 


regarding matters which 
questions relating to sexua 


embarrassing to respondents 
answers. Some investigators have suggested methods to deal with the embarr 


questions so that their threat is reduced. One simple method for doing so is not to 
questions directly trom the respondents but to ask them to express their views in others. Likewise, 
sentence-completion techniques may also be adopted to reduce the threatening nature of 
embarrassing questions. Oppenheim (1966) used the technique of sentence completion for 
studying the attitude of the psychiatric nurses towards a mental hospital and found the technique 
quite effective in reducing the threatening nature of embarrassing questions. 

5. Double negatives: Double negatives should be avoided. Such negatives tend to cancel 
each other and, therefore, create confusion for the respondents. For example: Do you not 
approve the idea that a college girl should not engage herself in domestic affairs? 

6. Leading questions: A leading question should also be avoided. By definition, 2 leading 
question is one, which by virtue of its content and structure, leads to a specific answer For 
example, the form of the question: “Should something not be done about...?” usually leads to ¢ 
positive answer and the question form: “You don’t think.... do you?” usually leads to a negative 
answer. Question builders should avoid such leading questions. Payne (1951) and Cantril ( 944) 

have investigated the role of leading questions and have concluded that there are numerous 
words which, if inserted in the question form, may lead to a specific answer. For example, the 
word, “involved” in a question like: “Do you think that the party should get involved in.---* 


a 





Techniques of Observation and Data Collection 271 


cenerally leads to an answer in the negative. Sometimes the words leading to a specific answer 
may be so subtle that they go unnoticed, The questionnaire builders should take special care 
when using subtle leading words. 

7 Presuming questions: Presuming questions are those which presurne something about 
the respondent. In other words, the question should not be such that necessarily implies that the 
respondent possesses the knowledge regarding the theme of the question or that he has 
participated in those activities, which are being asked. Suppose a housewife who has never used 
‘nutri nuggets’ (a kind of vegetable protein food prepared from soya) is asked: “Do you find nutri 
nuggets tasty?” This is one example of a presuming question, which should be asked only after a 
nousewite has used the product. 

s, Hypothetical questions: Hypothetical questions are of little value because the 
respondents’ answers towards such questions do not reflect anything concrete. The form of the 
questions: “Would you like to....2” and “What do you do if....?” illustrates hypothetical questions 
and as far as possible, such wordings should be avoided. 


Characteristics of a Good Questionnaire 
A good questionnaire must have the following characteristics: 

1. The questionnaire should be concerned with specific topics, which must be regarded as 
relevant by the respondents. The investigator must clearly state the significance, 
objectives and aims of the questionnaire either in a separate letter or in the 
questionnaire itself. 

2. The questionnaire should, as far as possible, be short because very lengthy 
questionnaires often find their way into the wastebasket. 


3. Directions and wordings of the questions should be simple and clear. Each question 
should deal with a single idea. 


4. The questions should be objective and should not provide any hints or suggestions 
regarding a possible answer. 


a 


. Embarrassing questions, presuming questions and hypothetical questions should 
be avoided. 

6. The questions should be presented in a good order, proceeding from general to specific 

responses, or from those showing a favourable attitude to an unfavourable attitude. 


. Lastly, a questionnaire must be attractive in appearance, neatly printed or duplicated 
and clearly arranged. 


Order of Questions 


ca ales the order of the individual questions is of great significance because the order 
Weisavue the validity of the obtained answers as well as the refusal rates (Cantril 1944, 
Sens 750). In the beginning, the respondent is unsure of himself as well as curious. So, the 
a. should be simple, general, and such as to put the respondent at ease. This has 
question i “ establishing rapport between the investi gator and the respondent. No sensitive 
chusal io _ arrassing question should be put in the beginning because it is likely to lead to 
manner. 6, — Then the questions should move from the general to specific aspect in a logical 
where the arp no break should be given in moving from general to specific questions. But 
and the ney, ° ecomes essential, the investigator should explain the significance of the break 
Westions din ° ASH Ons should follow in one or two sentences. Preferably, the sensitive 
latively only a “ be avoided, should be placed in or near the end so that if refusal is met, 
€w questions are left unanswered. 


i 


> 











272) Vests, Measurements and Researcp Methods tn Behavioural Sctenees 


nded that it is most wise to start with some broad 


questions relating to the topics and then, gradually narrowing down to the specific questions 
relating to the topics. They have named such a sequence of questions as a funnel sequence. 
Currently, the funnel sequence of questions has become the standard norm for questionnaires jp 


behavioural researches. 


Kahn & Cannell (1975) have recomme 


Functions of a Questionnaire 
A questionnaire pertorms general 
1. Description: One of the basic f 
group characteristics. In other words, the 
marital status, occupation, Income, political 
civic group or corporation, etc. These pieces of informatio 
investigator or researcher. For example, if a researcher Is a 
ployees in a university, he may be able to draw better in 
h as clique formation, intergroup rivarly, etc. 
> Measurement: Another important function of a questionnaire Is the measurement of 
‘ndividual and/or group variables like attitude, opinion, personality traits, etc. The questionnaire 
may consist of several items which aim at assessing such attitude, opinion, traits and habits of 


y two functions as given below. 
unctions of a questionnaire is to describe the individual or 
questionnaire provides description about age, sex. 
affiliation, religious affiliation, membership to some 
n, in turn, serve many purposes of the 
ble to know the age distribution of a 
group of em ference and explanation for 
certain group behaviours suc 


the persons. 

Types of Questionnaires 

Questionnaires, as used in behavioural researc 

dimensions—(a) type of response required, and (b) 
Based upon the type of response required, the 


hes, can be classified on the basis of two 
type of questionnaire administration. 
questionnaire may be of the following 


two types— 
1. Fixed-response questionnaire 
2. Open-end questionnaire 

e two types of questionnaires is given below. 


1. Fixed-response questionnaire: As its name implies, a fixed-response questionnaire Is 2 


naire which consists of statements of questions with a fixed number of options oF 
response that best fits or suits him. Such 3 


or pre-coded type of questionnaire. 
n hereunder. 


A discussion of thes 


question 
choices. The respondent is asked to check the option or 


questionnaire is also known as closed-form questionnaire 
w statements illustrating fixed-response questions are give 


fe 
(i) Do you feel shy in talking to members of the opposite sexé Yes/No 
(ii) Do you like to entertain members of the opposite sex in a club? Yes/No 
(iii) Do you like to have a member of the opposite sex as one of your shopping 
Yes/No 


partners? 

One of the basic assumptions to be made behind the use of a fixed-response questionnaire Is 
that the target sample has an adequate knowledge of the subject matter of the questionnaire 
Another assumption is that the researcher has enough knowledge about the sample under 
investigation so that he can easily anticipate what kinds of responses are likely to be given. 

2. Open-end questionnaire: An open-end questionnaire is a questionnaire which co 
of questions that require short or lengthy answers by the respondents. Usually, here the ans’ 
are longer than those given in the fixed-response questionnaire. The following examples 


illustrate open-end questions: 


nsish 
yer 


ON 


Techniques of Observation and Data Collection 273 


(j) What are the causes of student unrest 
(ii) What methods do you recommend for 


There are questionnaires that are made up of items having both fixed and open-ended 
questions. Champion (1969) constructed a questionnaire in which both types of questions were 
ysed. This questionnaire had been administered to a group of seniors at a small college. 

Based upon the method of administering questionnaires, the following are the two common 
types of questionnaires: 


improving discipline on the university campus? 


1. Mail questionnaire: A mail questionnaire js a questionnaire which is mailed to the 
designated subject with a request to answer the questions and return it through mail. Instructions 
for completing the questionnaire are usually enclosed and a return envelope is also provided. 
Generally, the researcher waits for a fortnight or so for the reply. A survey conducted in this area 
has revealed that about 70% of the questionnaires mailed are not returned. 

2, Face-to-face administered questionnaire. The 


one where the selected subjects are given questionnaires with instructions to complete them in 
the presence of the investigator or his associates. This type of questionnaire is more common than 


the mailed questionnaire. Face-to-face administration of a questionnaire is usually preferred 
where subjects for the study are readily available at one place. 


face-to-face administered questionnaire is 


Advantages and Disadvantages of Fixed-response Questionnaires 


A fixed-response questionnaire has both advantages and disadvantages. The important 
advantages are given below, 
|. Fixed-response questions or items are easily scored and coded. This f 


acilitates the 
statistical calculation and helps the researcher in arriving at 


a conclusion soon. 

. Such questions or items require no writing from the respondents, be 
simply to check the response that applies to them. Such items or questions become more 
advantageous to those respondents who, due to some reason, can’t adequate 
themselves verbally. 


cause they are 


ly express 





. Fixed-response questions or items usually take 
lengthy questionnaires with lixed-response items « 
than those questionnaires that require written 


less time in their completion. Even 
an be completed rapidly in less time 
answers for the same type of information. 
lfsuch a questionnaire is sent to the subject by post, the researcher can expect that it will 


be returned by the subject soon because such questions take least effort and time in their 
completion. 


Despite these advantages, such a questionnaire has some limitations as mentioned below. 

1. One important disadvantage of the tixed-response item is that, here, the researcher 
remains unable to provide the respondent with all relevant response alternatives. If the 
respondent is forced to make a choice among several such alternatives that, in fact, don’t 
lithim, the resulting information will be misleading for the researcher. 


~A lixed-response questionnaire sometimes encourages the respondent to adopt some 


kind of response set or bias. In an attempt to get over it quickly, the respondent may 
check only the first option and ignore others in most of the items or questions. In such a 
‘uation, again, the final information will be misleading for the researcher. 


Adva . : . 
Pi and Disadvantages of Open-end Questionnaires 
. PEnah 


isadvany q questionnaire also, like the fixed-response questionnaire, has both advantages and 
Nlages. The following are the major advantages of open-end questionnaires: 





> 


274 # Jests, Measurements and Research Methods in Behavioural Sciences 


1. Open-end questionnaires have been found to be particularly beneficia| wh 
investigator has little or no information about the subjects to be studied. 


2. Since an open-end questionnaire provides a greater degree of flexibility, jt en 
the investigator in eliciting unanticipated and insightful replies from the respondeng j 
actually increase the researcher’s understanding of what is going on and why ha 


Despite these advantages, there are also some disadvantages of open-end questionnaires 
a 


given below. 

1. Several types of biases may operate in an open-end questionnaire. In the sample bej 
studied, some subjects may be adept in the art of self-expression while some subject 
may be poor in it. In such a situation the latter type of subjects will be unfairly combine 
with the former type of subjects, and therefore, an educational bias will operate which 
will produce misleading results. Socio-economic differences also tend to Contribute tp 
misleading results. Subjects of different socio-economic backgrounds don’t necessarily 
see things in one and the same way, nor do they use similar vocabulary to express 
themselves. 

2. Another disadvantage is that an open-end questionnaire is time-consuming as compared 
to the fixed-response questionnaire. Not only this. If open-end questionnaire is mailed 
to the respondents, the response rate will be lower as compared to the situation where 
fixed-response questionnaire is used. 

3. Open-end questions are difficult to be objectively scored or coded. Different 
respondents may appear to provide similar responses to the same item ona 
questionnaire but the importance and meaning that each respondent applies to his 
answer may be different. Generally, an attempt is made to provide several individuals in 
one and the same category for the purposes of facilitating and analyzing data. The results 
of such an analysis are usually misleading. 


Advantages and Disadvantages of Mail Questionnaires 
For a discussion of advantages and disadvantages of mail questionnaires, the readers are 
requested to consult Chapter 15. 


Advantages and Disadvantages of Face-to-Face Administered Questionnaires 
In face-to-face administration, the questionnaire is administered to the people in presence ot the 
investigator or researcher. This type of questionnaire has the following advantages: 

1. A face-to-face administered questionnaire is less time-consuming. Here, the rate ol 
questionnaire completion is high, and a better return as compared to mail 
questionnaires is expected. 

2. The investigator knows well who is completing the questionnaire or answering une 
questionnaire. This increases the validity of the collected information. 

3. The researcher is present to answer any query that the respondent may raise during the 
completion of the questionnaire. 

4. A face-to-face administered questionnaire enables the investigator or researcher to pu 
probe questions, if any, for detailed analysis. 


Techniques of Observation and Data Collection 275 


However, a face-to-face administered questionnaire has some limitations as indicated 
below. me | 
1. The presence of the investigator or researcher makes the examinees or the respondents 
too conscious and that sometimes adversely affects the validity of the information thus 
collected. 
, If the researcher or the investigator is required to travel throughout the city or state for 
contacting respondents personally, the technique becomes much more costly and time- 
consuming as Compared to the mail questionnaire. 


ho 


3. In such a questionnaire administration, the group of the respondents is often selected 
according to accessibility and convenience. Therefore, such a group usually no longer 
remains representative of the population. This affects the external validity of the 
questionnaire. 

Thus, we find that the different types of questionnaires have their relative advantages and 
disadvantages. Keeping in view these advantages and disadvantages, the researcher selects the 
appropriate ones for use. 


INTERVIEW 

The procedure for interview is different from that for the questionnaire, but both have the same 
aim, and it is to obtain data regarding the respondents with minimum bias and maximum 
efficiency. Interview is a face-to-face situation between the interviewer and the respondent, 
which intends to elicit some desired information from the latter. Thus an interview is a social 
process involving at least two persons, the interviewer and the respondent. For success of the 
interview, one must take care of the interaction between the interviewer and the respondent. The 
respondent's answer to the questions raised by the interviewer and his other behaviour serve as 
important clues to the interviewer and are likely to affect the behaviour of the latter. Likewise, 
during the course of the interview, the respondent tries to size up the interviewer and his 
inference about the interviewer is likely to influence his answers, Apart from these, the success of 
the interview is also dependent upon three important conditions, namely, accessibility, cognition 
and motivation, which have been discussed in Chapter 15. 


Types of Interviews 


There are two types of interview, namely, formal interview and informal interview. A formal 
interview may be defined as one in which already prepared questions are asked in a set order by 
the interviewer and answers are recorded in a standardized form. The formal interview is gaining 
much popularity today. The investigator aims at having interviews conducted in a uniform way. 
The formal interview is also known as a structured or patterned interview. Thus the formal 
Interview is a systematic procedure for collecting information regarding the respondents and 
oreiore, it is not surprising that the reported validities for formal interviews are usually higher 
an those for the informal interviews. As the interview situation is highly structured, that is, the 
ae their sequence and scoring methods are all predetermined, relatively less trained 
a ers can also conduct such an interview smoothly. There are two important limitations ot 
tiene First, the procedures of conducting a formal interview are expensive and 
cannot be ming. Therefore, a formal interview is conducted only where an informal interview 
obtains ante, Second, the validities of the formal interview are usually less than those 
(Guilforg + some common methods of biodata analysis and standardized psychological test 
66) 

An info 
Preset Orde 
“garding 


rmal interview is one where there are no pre-determined questions nor is there any 

of the questions and it is left to the interviewer to ask some questions in a way he likes 

aj ; ae ; 
Number of key points around which the interview is to be built up. As most things 








> 


276 )~=« Tests. Measurements and Research Methods in Behavioural Sciences 


depend upon the interviewer, the situation remains unstructured and, therefore, such Ss 
interview is also known as an unstructured interview’. An informal interview is more common} 
ised than the formal interview and is a flexible method of collecting data. The primary advantacy 
of the informal interview over the formal interview is that in the former, the interviewer can ‘dig 
deeper’ and thus, get a deeper understanding of the respondents’ behaviour. As the interviewer jc 
left free to ask the questions, he can mould questions in such a way that may reflect the deepe; 
aspects of the respondents’ personality. As a formal interview Is relatively inflexible, the 
interviewer is bound to proceed with the set questions and thus, is deprived of the method oj 
probing the respondents’ behaviour. Despite these, the informal interview has three importan; 
limitations. First, in an informal interview there is greater scope for personal influence and bias of 
the interviewer as compared to the formal interview. In other words, in the informal interviey, 
there is ‘more of the interviewer’ than of the ‘standard question’ and hence, the informal 
interview is less reliable than the formal interview. Second, an informal interview requires greater 
skill on the part of the interviewer than the formal interview. The conduct of an informal 
interview requires that the interviewer be tactful, intelligent, and have a social sense as well as a 
deeper knowledge of the subject matter. Due to these requirements, the scope of the informal 
interview is limited. Third, the data obtained from an informal interview is difficult to quantify 
and analyze because of three inherent difficulties. One is that since interviewers are left free to 
ask the questions from the respondents, different questions are likely to be asked from different 
respondents. In such a situation, it is difficult to aggregate and summarize the results and draw, 
any meaningful conclusion. Second, even if the interviewer asks more or less the same questions 
to each respondent, differences in language used by the interviewer are likely to make the 
responses not comparable statistically. Third, the results obtained from the informal interview are 
not amenable to statistical analysis. This further complicates the matter and puts the informal 
interview at a disadvantage. 


Major Functions of an Interview 


Interview as a research tool is selected basically because it serves two functions which mark it out 
with positive advantage from the rest of the methods of data collection. These two functions are: 


1. Description, and 2. Exploration. 


These two may be discussed as under: 

1. Description: An interview has been found to be particularly useful in providing insight 
into the interactive quality of social life. In an interview, people spend most of the time with one 
another in some form of verbal interaction. The verbal interaction enables the interviewer in 
understanding how people view the subject under investigation. This understanding helps him 
know his social life which is otherwise abstract and merely a statistical phenomenon. 

2. Exploration: Another purpose of an interview is to provide insight into the unexplored 
dimensions of a topic or subject. A review of the work done in this area reveals that an interview 
helps a lot in exploring some new variables for study as it also helps sharpen the conceptual 
clarity. Talking with interviewees and thereby gaining insight into their conduct from inquiries 
about their behavioural dimensions provides adequate stimulation for development of various 
hypotheses for subsequent testing and research. 


Of the two functions of an interview, exploration is considered more important than 
description. 





There are also several other names which are applied to the formal and informal interview. For example ‘standardized’ and 

= a a a ee oe ee e : : é c 

5 maccmmmant Solana and cameron coer and ‘flexible’; ‘extensive’ and ‘intensive’: ‘cosnteciiledl’ and aeortOll 

are some oTThne common names whic dVe Deen used by differ a ; id ; '. : 

‘ , ent research investipators . | mie 
respectively. tigators tor formal and intormal inte 


lechniques of Observation and Data Collection fa AF 


Factors Affecting the Uses of Interviews 


ntists have recognized various factors that influence the usefulness of interviews. 


4M ial SA ie . | 
grouped into three major factors: 


These can be 
(1) Characterisucs ot interviewers 
(2) Characteristics of interviewees 
(3) Nature of the problem under study 
Abrief description of these factors follows. 
(1) Characteristics of interviewers: A review of the work done in this area reaveals that 
both the subjective and objective characteristics of the interviewers do influence the usefulness 
of the interview. Subjective characteristics are those which are peculiar to the individual and are 
extremely important where the major function of the research is exploratory. For the effectiveness 
of the interview, it is essential that the interviewers must have inquisitive minds so that they are | 
capable ot readily attuning themselves to the newly emerging facets of a problem. When the 
interviewer possesses this quality, he can readily sharpen the questions, and redirect and 
concentrate on other matters if needed during the time of interview. The interviewer must also | 
have the ability for drawing together the scattered pieces of information into a uniform and | 
integrated whole, which helps him conduct the interview in a smooth way. 
Besides these subjective characteristics, objective characteristics of the interviewer are also 
important from the point of view of the effectiveness of the interview. The major objective 
-haracteristics of the interviewers such as sex, age, race, manners, clothing, culture, education 
social class, speech, etc., are important in determining the effectiveness of the interview because 
they interfere with, rather than influence, the verbal dialogue that takes place between the 


interviewers and the interviewees (Gorden, 1969). 
In fact, any research problem incorporating the interview as a data-collection tool must take 


into consideration both these types of qualities of the interviewers. 


2) Characteristics of the interviewees: Since an interview is a special form of 


conversation, it is expected that the characteristics possessed by the interviewees can also affect | 
that tend h 


the effectiveness of the interview. There are two basic characteristics of the interviewees 
io influence the effectiveness of the interview. ; 

(i) The first characteristic is the capacity of the interviewees to verbalize. Therefore, the very 
young and the mentally retarded or ill and others with extremely limited communication skills 
are not suitable as effective interviewees. Likewise, people with little formal training and those 
with relatively isolated personal conditions are not able to verbalize their views in a meaningful 
way and, therefore, are not effective interviewees. 

(ii) The second characteristic is the willingness of the interviewees, l.e., the interviewees 
Must be willing to verbalize their viewpoints regarding the research problems. If not, they will not 
“xpress their views about the questions asked by the interviewers. Hence, they are not 
“onsidered as good and effective interviewees. Lane (1962) suggested that for increasing the 
Mi i of the interviewees to be interviewed, they should be paid money tor participating in 

“erview and it should be conducted on the interviewee’s terms. 
tiles | Nature of the problem under study: The nature of the research problem or topic also 
oidblen the effectiveness of interviewing. In general, it is believed that when the research 
a suet — such that touch the segments of the persons’ private lives and/or when the problems 
he eae they create special difficulties in verbal ization, they tend lo have an impact upon 
people an quality of interviewing (Cressey 1953; Sutherland & Cressey 1956). For example, 
Want to sash want to reveal how much money they earn trom their work, as well as they don't 

ee alize the peculiarities or incidents of their sex lives. As such, interviewing about 
farch topics is bound to have an impact upon its scientific qualities. 








= 





as 


278) Freer Moasurwments and Research Methods in Behavioural Sctences 


Thus, the effectiveness of interviewing is determined not only by the qualities of 
inteviewees and interviewers but also by the nature of the research problem. . the 


Advantages and Disadvantages of Interviews 
The | tary ie f z “a + ' ‘ l 7 avi rii yr . : j 
i on lew asa _ arch too! has both advantages and disadvantages. According to Gorden 
(1969), its major advantages are as follows: | 
(1) An interview allows greater flexibility in the process of questioning. As such, many types 
of probe questions can be put and analyzed, | 
(2) It facilitates the investigator in obtaining the desired information readily and quickly 
(3) It facilitates the investigator in being sure that interviewees have themselves interpreted 
and answered the questions. This increases the validity of the conclusion arrived at. | 
| (4) In an inteview, a desired level of control can be exercised over the situation or context 
within which questions are asked and answers are given. 
(5) The validity of the verbal information given by the interviewees can easily be checked on 
the basis of their nonverbal cues. 

Still, the interview is not without limitations or problems. Its major disadvantages are 

mentioned below. 

(1) Interviewer’s variability: \t is commonly found that at times, the interviewer views 

similar responses differently and records them differently from interview to interview. Thus, he 
himself becomes a source of variation. It has been observed that with the degree to which the 
interviewer becomes a source of variation from interview to Interview, the effectiveness and 
dependability of interviewing are adversely affected. 

(2) Inter-interviewer variablity: Inter-interviewer variablity is one of the major problems 
with interviewing. When several interviewers are used in a study and the nature of interview is 
unstructured, the interviewers vary considerably among themselves in their respective abilities to 
elicit the exact kind of information needed, project the proper kind of image, record the 
information appropriately, and so on. As the number of interviewers tend to increase, the 
problem of the interviewer's variablity becomes all the more compounded and affects the real 


purpose of the interview. 
(3) Validity and dependability of verbal responses: |n an interview, the interviewees 


verbally answer the questions asked by the interviewers. Social scientists have grave doubts 
whether a person actually behaves the way he professes to behave. They have expressed the 
concern that verbal responses can’t be relied upon with a considerable degree of validity and 
dependability. 

(4) Time: The interview takes much time in its completion because each respondent of 
interviewee is interviewed individually and the records of the verbal interaction of eacn 
respondent is kept individually. Sometimes, tapes are used for recording the proceedings 0! the 
interview for curtailing the time taken unnecessarily in recording by the interviewers. But eve? 
then, additional time is taken to transcribe information from tapes. 

(5) Variations inherent to the interviewing context: Some 
(1964) have shown that the interviewing context does not ordinarily remain ¢ | 
interviewers move from one interview to another. In such a situation, the investigator remains 
unable to depend upon the tacts obtained from interviewing. Thus, the context of the interview 
itself becomes a variable which must be accounted tor in assessing findings from studies that 


utilize interview as one of the important research tools. | 

(6) Recording information: How to record information being given by the interviewee 
also a problem in interviewing. No foolproot system of recording has yet been worked out Ls 
everybody’s satisfaction. Some prefer to jot down the main comments or points made at cruci4 


social scientists like C icourel 
onstant 4 


— enti laine aii: 


Techniques of Observation and Data Collection 279 


junctures during interviewing, while others prefer to record the information in details, and still 
others prefer to rely on some type of recording device for the interview. One advantage with 
recording the interviews Is that there is no problem with remembering the chronological 
sequence of the interaction taking place between the interviewers and the interviewees. The 
-eceding two methods of recording are subjective and vague. Since there is no standard way of 
recording information, the dependability of the interview becomes adversely affected in the 
absence of a standard criterion. 
Thus, there are several disadvantages associated with an interview. By making the interview 


structured, several of the disadvantages can be done away with, but even then little can be done | 
to offset the limitation of time. ] 


Important Sources of Errors in Interviews 


Although the interview is commonly used as an important method of data-collecting device in 
behavioural sciences, some important sources of errors have been located by psychologists, 
sociologists and educationists. The major sources of such errors may be enumerated as follows: 

(1) Attitude of the interviewer: One major source of error in interviews is the attitude of 
the interviewers themselves. Sometimes the interviewers carry a definite attitude and bias toward 
the favourability or unfavourability of the respondents or interviewees. Due to this type of 
attitude or preconceived notions, the interviewers remain unable to arrive at a correct evaluation 
as also they commit several errors in recording the information given by interviewees, This 
unnecessarily creates problems in the sources of interviewing. 

(2) Incomprehensibility of the questions asked: Sometimes, the questions put by the 
interviewers are deliberately made difficult and incomprehensible. As such, the interviewees are 
unable to give correct answers. In such a situation, the data obtained is not dependable and the | 
investigator is bound to arrive at a wrong conclusion. | 

(3) Lack of warmth in the situation of the interview: Gorden (1969) is of the opinion that 
when the interviewers don’t exhibit an affectionate and warm relationship with the interviewees 
and, instead, exhibit an unfriendly and curt behaviour, the interviewees naturally don’t 
co-operate with the interviewers. This becomes a major source of error in the interview, and 
consequently, whatever data is thus collected lacks dependability and validity. 

(4) Lack of motivation in respondents: When the interviewees or respondents don’t 
possess proper motivation to answer the questions asked during the interview, it constitutes a 
major source of error in interviewing. The data obtained no longer remains representative of the 
true viewpoints of the interviewees who, in this situation, mostly give their answers, as Yes’ or 
No’. The investigator remains unable to arrive at any meaningful and defnite conclusion. 





(5) Duration of interview: Sometimes the duration of the interview is unnecessarily long 
Causing the interviewees to feel nervous and monotonous. As such, the information given by 
em no longer remains dependable and the interviewers tend to arrive at erroneous conclusions. 
- a sources of errors in interviews can be minimized by having skilled interviewers who 

age and manipulate the procedure in a nice way. 


election and Training of Interviewers 
” nse of a formal as well as an informal interview depends upon the adequate selection of 
itera ies ang upon their training. Needless to say, good interviewing ability is one of the basic 
Or any interviewer. Apart from this, for a successful interview, it is essential that the 
> possess the following personal characteristics: | 
ervey, ones The interviewer must be honest and scrupulous in order a an coca 
Merviewer make a valid contribution to the research. Only ang the rs an 
level of ha several interview boards is compared and assessed, we are ow his 
Y In Interviewing practice. 


i | 


2. Accuracy: The interviewer should be accurate in recording the ANSWETS given f 
respondents. Not only this, in formal interviewing, he must also be accurate in followin 
sequence of the set questions. Interviewers, who commit mistakes in Carrying oy th i 
administrative duties, should be avoided. “er 

3. Adaptability: By adaptability is meant the capacity to adjust in varying CirCumstan 
in which the interview is to be conducted. The interviewer may have to conduct an intervie. 
sometimes with persons living in slum areas/rural areas and sometimes with People livin : 
urban areas. In such a situation, he must possess the trait of adaptability so that he Can intery; : 
these different types of respondents without showing any prejudice. Those interviewers a 
have strong prejudices against some people, should not be entrusted with the task of interviewin 
them. A 

4. Interest: The quality of work done by interviewers is also affected by their interest, jj 
they are not interested or least interested in the interviewing work, errors and POOr performance 
are likely to show up frequently. The general observation is that if the interviewers are assigned 
the task of interviewing frequently, and if the task is a lengthy one, the interest of the interviewers 
is likely to deteriorate. 

9. Temperament: The temperament of the interviewers is likely to influence the output of 
the interview. They should not be emotionally involved in the problems presented by the 
respondents nor should they exhibit characteristics such as OVer-ageressiveness oy 
over-sociability because either characteristic is likely to endanger the purpose of the interview. 

There is diversity of opinion regarding the training of interviewers. Some psychologisi 
recommend a sort of formal training in which potential interviewers are given training for six to 
eight weeks with the help of either the lecture method or discussion method. There are some 
other experts who recommend on-the-job training for interviewers in’ which potential 
interviewers are asked to take their seat near the interviewers and simply watch the interviewers 
conducting a few interviews so that they can take over from the interviewers at a later date. 


280 = Tests, Measurements and Research Methods in Behavioural Sciences 


CONTENT ANALYSIS 

Content analysis, sometimes known as document analysis, is a method of systematic 
examination of communications or of current records or documents. Instead of questioning 
respondents according to some scale items or observing their behaviour directly, the 
content-analyzer takes the communications or documents prepared by the respondents and 
systematically finds out the frequency or proportion of their appearances. 

In content or document analysis the primary sources of data are letters, autobiographies 
diaries, compositions, records, reports, printed torms, themes or other academic work, books, 
periodicals, bulletins or catalogues, syllabus, court decisions, pictures, films, cartoons, etc. It's 
the obligation of the researchers to establish the trustworthiness of these data that have bee" 
drawn. Content analysis can also be used with responses of projective tests, with all kinds ol 
verbal materials and with materials specially produced for research problems. 


Purpose of Content Analysis 
The following major purposes are served by document or content analysis: 
(i) To explain and describe the prevailing practices or conditions 
(ii) To identify concepts, beliefs, thinking and literary styles of a writer 
(ii) To locate and explain the possible causal factors related to some outcome or event 
(iv) To analyze the different types of errors in students’ work 
(Vv) To locate the level of difficu Ity of presentation in textbooks or other similar books 


: ; : - et HONS 
(vi) To analyze the use of different symbols that tend to represent different institute” 
countries or even different points of view 





Techniques of Observation and Data Collection 281 


(vii) To find out the relative importance of some topics or problems 


(viii) To make careful evaluation of bias, prejudice or propaganda in textbook presentation 


Methods of Content Analysis 

There are different ways or methods of content analysis. Of these various methods, Berelson’s 
(1954) method is the most useful and has been commonly applied by investigators. His method 
of content analysis may be presented under the three heads given below, 


Specification of the Universe 

in Berelson’s method, the first step is to define the universe or U of the content. Let us suppose 
that the investigator is interested in studying the effect of teachers’ strictness upon classroom 
discipline. In this problem, classroom discipline is the dependent variable and is the universe. For 
testing various hypotheses, the U should be divided or categorized into different subparts. There 
can be three important categories of classroom discipline—punctuality, attentiveness and 
completion of task by the students. Related to each of these subcategories, there can be 


three hypotheses: 
1. If the teacher is strict, the students will come to class in time. 


2. If the teacher is strict, the students will be attentive in their class. 

3. If the teacher is strict, the students will do their home task. 

Thus categorization of U directly spells out the main variables of the hypotheses and reflects 
the theory and problem of the study. 


Unit of Analysis 
Unit of analysis refers to the measure in terms of which content analysis can be carried out. 
Berelson (1954) has suggested five major units of analysis, namely, words, themes, items, 
characters and space-and-time measures. Words are the smallest unit of analysis, although 
sometimes letters (shorter than words) may also be used as units of analysis. Suppose the 
investigator wants to know the relation between the nature of words and the comprehension 
level of the students. The words may be categorized as easy, medium and difficult words, and 
each word of the list may easily be categorized into any one of the three categories, and 
subsequently, their comprehension level may be determined. The theme is another unit of 
analysis, which is often expressed in terms of sentence or proposition. Usually, themes are 
combined into sets of themes. Most of the time, themes are complex and in such a situation it is 
better to avoid content analysis because the analysis is likely to yield unreliable results. Despite 
this, the theme is a realistic unit and is very close to the original content if it is relatively simple. 
Suppose the investigator wants to measure the trait of self-disclosure. For this, the theme may 
Consist of all those statements that use ‘I’, ‘My’, ‘Me’, ‘Mine’, etc. Likewise, suppose the 
Investigator js studying child-rearing practices in a particular culture. Further suppose that he 
ports, “babies are breast-fed till three years of age and after that they are made solely dependent 
“Pon solid food”. The theme may be categorized as ‘late weaning’. Item is another important unit 
of content analysis. It refers to a whole production made by the subject towards a given stimulus. 
For €xample, short stories written by subjects towards the TAT pictures constitute an item. 
ne a short autobiography, a short radio programme, a —e _ ye 
di as the unit of an item. For behavioural research, item IS Gnie-OFene:most lip : 
ysis, According to Berelson, character and time-and-space units are not important for 
‘ WiOural researches. The character unit Is commonly utilized by those who do research in the 


Held of literature. In story analysis as well (which is true in the case of projective tests), the unit of 


mu acter can be used, The time-and-space unit refers to the actual physical measurement of the 
“N Content, e.g., the number of pages, number of discussions, number of photographs, number 








282 = Jests, Measurements and Research Methods in! fobavioural Sctences 


(Quantification 

Quantification is the third important aspect in content analysis. It refers to the Process of 
assigning numerals to the objects of the content analysis. Ordinarily, this PFOCess Can be 
completed in any of the three ways, namely, nominal measurement, ordinal measurement ang 
rating. In nominal measurement each object, after being assigned to a proper Category jg 
counted. Such measurement is commonly applied to those situations where the number oj 
objects is large. Ordinal measurement consists of ranking of objects done by subjects accordin 

to some fixed criterion. For example, subjects might be asked to rank a given set of human 
photographs according to the intensity of emotion expressed by them. Rating is another forrn of 
quantification, which resembles the second form. The whole reproduction or the object May be 
rated on several dimensions. For example, an essay written by a child may be rated on originality, 
fluency, organization, spelling, etc. 


Evaluations and Limitation of Content Analysis 
Content analysis has some merits. The important merits are as follows: 

First, content analysis is applicable to a wide variety of materials such as creativity, attitude, 
ethnocentricism, stereotypes, curriculum cha nges, values, interest, religiosity, college 
budgets, etc. 

Second, content analysis can also be used to examine the effect of experimental 
Manipulation upon the dependent variables. If the investigator wants to study the effect of 
practice upon the improvement of handwriting of children, content analysis may be of no less 
importance than any experimental design. 

Third, content analysis is also used to validate other methods of observation. Suppose one 
wants to validate a self-disclosure inventory. It is expected that people, in general, would not like 
to give personal information against which the test can be validated. But subjects can be asked 
some projective-type of questions and the responses can be content-analyzed. Subsequently, the 
test can be validated against the content-analyzed response. 

Despite these merits, content analysis should be used with caution because of the 
complexities involved. 


OBSERVATION AS A TOOL OF DATA COLLECTION 


Meaning and Nature: In behavioural researches, questionnaires and interviews are very 
common and important types of data-collecting devices. But there are situations in which these 
devices can’t be used meaningfully. For example, when the investigator wants to see the 
behaviour in natural situation and study the situation-based features of conduct, the 
questionnaires and interviews no more serve the purpose and some form of observation becomes 
indispensable. 

In a broad sense, the investigators constantly observe persons’ behaviour. For example, the 
investigators observe the behaviour of the persons in experimental situations, notice various 
expressions of the interviewees or respondents during the interview, watch people answering the 
questions or items of the questionnaire, and so on. Thus, all the investigators have some 
first-hand on-the-scenes contact with the persons whom they are studying, But such observations 
are casual by-products of the investigators, which must be distinguished from the observation 
used as a fundamentally data-gathering device. 

Observation, as a fundamental technique of data collection, refers to watching and listening 
to the behaviour of other persons over time without manipulating and controlling it and record 
findings in ways that allow some degree of analytical interpretation and discussion. Thus, 
observation involves broadly selecting, recording and encoding behaviour for empirical aims o! 





echniques of Observation and Data Collection 283 


description OF development of theory (Weick 1968). In fact, observation, when properly and 


-cientifically conducted, is characterized by the following features: 

}. In all there i natural social context in which person's behaviour is studied. 
Thus, © Benne on usually occurs in natural settings, although it can also be used in such 
contrived settings as laboratory experiments and simulations | 

2. Itcaptures those significant events or occurrences that affect the relations among persons 
being studied. | _ 

3. It identifies important regularities and recurrences in social life by comparing and 
contrasting the data obtained in a particular study with those obtained in the study of 
various natural settings. 

These characteristics are such as make the scientific and fundamental observation distinct 
from casual and are more or less spontaneous observations made by researchers in course of 
conducting the investigations. 


Purpose of Observation 
Basically, observation as a tool of data-gathering device, has the following three basic purposes: 
1. One major purpose of observation is to capture and study human behaviour as it actually 
happens. It helps in snapshot comprehension of the activities of the persons in real life or 


social life. 

2. Another purpose of observation Is to provide a graphic description of real life that can be 
acquired in other ways. There are so many areas of life about which we have few 
thorough descriptions and much is taken as granted about those areas by social 
scientists. For example: How does a delinquent steal a motorcycle? How does a person 

o be an engineer or doctor or politician or professor? The 

descriptive base for all such life events is often provided only by observation. 


3. Another purpose of observation Is exploration. When the investigator observes human 
behaviour in a real-life setting, he gets a good chance to explore those variables which 
were important but overlooked. He also develops a tendency to look beyond what Is 
already known about the subject and to examine the probability of some alternative 
directions for research. Not only that, observation also aims at correcting some 


methodological errors which otherwise might have been overlooked. 
poses of which description and exploration are 


actually go about learning t 


Thus, observation serves many useful pur 
more important. 


Important Types of Observation 
There are several ways of classifying observation. 


da | 
f a a generate useful and researchable information, 
ollowing two types: 


On the basis of the ability of observational 
Reiss (1971, 4) divides observation into the 


Systematic observation Is one which is done according to 
dance with the logic of scientific inference. A 
of children in their play group, with some 
ample of systematic observation. 

on is a type of casual observation 


_. 7 stematic observation: : 
ies procedures as well as in accor 
6 eli studying the aggressive behaviour | 

€ and explicit principles decided beforehand, is an ex 


| Unsystematic observation: Unsystematic observati Lpay 3 
© Dy the investigator without specifying any explicit and objective inference. A psychologist 
lway platform without any explicit 


" SOciolog; ; : 
Pring; meaiet observing the behaviour of people on a fal | 
‘Pies and procedures is an example of unsystematic observation. 





i hb Methods 4 Behapioural Sctences 
2h4 Jets, Measnroments and Resear hb Methods in 


played by the investigator, ¢,, 


classified on the basts of the role | 
Participant observation ap, 


ervation may be classifed into 
Nonparticipant observation. A discussion of these two may be cian 1s under: | 

(A) Participant observation: As its name implies, In parrespem observation th, 
investigator actively participates in the activities of the group to be sition Here, the 
investigator may already be the member of a group or organization and decide to observe f 
under one or more situations. Or, he may join the group for the express purpose of observing the 
group under one or more situations. The procedure of participant observation is ofter 
unstructured and, usually, the identity of the observer is not known to other members of the 
group. This is called disguised participant observation. But sometimes the persons who are being 
observed know that the observer is present for collecting information about them. This is known 
as undisguised participant observation. Therefore, other members of the group take him as ar 
ordinary member and interact with him in a natural way. Since the procedure here is usual) 
unstructured, the observer has some flexibility in deciding what to observe and how to record it 
Participant observation is usually used to provide descriptions that otherwise would be 
unavailable. For example, in one of the most important and provocative studies by Rosenthan 
(1973), disguised participant observers in a psychiatric hospital posed as patients and later 
provided a good account of the experiences. 

However, participant observation has both strengths and weaknesses. An account of its 
strengths is given below. 

(1) In participant observation, since the observation is done in a natural setting, the 
investigator is able to record the behaviour in a realistic manner and naturally, then, the analysis 
yields meaningful and convincing conclusion about human behaviour. 

(2) Usually, the complete observation by the method of participant observation takes severa 
days and sometimes several months. As a consequence, whatever information is collected is very 
broad and meaningful for understanding human behaviour. 

Despite these strengths, participant observation has some limitations or weaknesses 25 


Observation has also been 
the basis of this criterion, obs 


given below. 

(1) Since participant observation is usually unstructured, it fails to be precise about the 
procedures for data accumulation. According to Reiss (1971), in participant observation ‘ess 
attention is paid to precision and more to discovery. 

(2) Participant observation is a time-consuming device and, therefore, not all observers 
become ready to proceed by the procedures of participant observation. 

(3) Since the observer participates in the activities of the group in an active manner, he 
sometimes starts showing human weaknesses like love, sympathy, hatred, etc., towards the 
members and their behaviour. This considerably jeopardizes the validity and dependability of the 
observation. 

(B) Nonparticipant observation: Nonparticipant observation is the observation in whic? 
the investigator observes the behaviour of other persons in a natural setting but does not remain 2 
participant in the activities being observed. Nonparticipant observation is usually structured, and 
therefore, the observer preplans the likely nature of the natural setting, representativeness of da 
problems associated with the presence of the investigator, etc. Here, the observer of the 
investigator is able to go into the development of exploratory strategies or some specitic researcn 
questions for probing. 

Nonparticipant observation also has some strengths and weaknesses. Its major points © 
strengths may be summarized as tollows: 





fechniques of Observation and Data Collection 285 


ais more reliable 


ipant observation is usually structured, the obtained dat | | 
es of observation 


nonpartic 7 de 
observer clearly plans the different aspects and process 


Ly Ginee *: 
(" ative. The 


and represe™ 
; a mice way: | 
(2) In nonpartl eon 
of social behaviour Ina 
related probe. 
nonparticipant observation h 


to concentrate Upon any specified 


ipant observation the observer is able | : 
» find out the 


in 
better way and, therefore, gets a better opportunity tc 


aspect Or 
olution of the 
Howevel ' 
(1) In nonparticipan 
remain 


as some limitations as mentioned below. 
t observation, the behaviour of the persons being observed and the 
“sax do not 4 natural one. The persons develop the consciousness that their 
a sre being observed. This consciousness slightly distorts the natural flow of their 
nae Since settings are structured, it also affects the persons being observed. But this 
ie on not considered very serious for want of evidences to the contrary. So far as the 
init enowledge goes: there is little evidence to show that intervention in actual social context, 
Satie in non-participant observation, creates any problem. Likewise, there is no evidence 
jo it presence of a nonparticipant observer tends to have any detrimental effect upon the 
behaviour under study (Black & Champion 1976). 
(2) Nonparticipant observation fails to capture the natural context of social settings to the 


extent participant observation is able to capture. 
DIFFERENCE BETWEEN PARTICIPANT OBSERVATION AND NONPARTICIPANT 


OBSERVATION 
Participant and nonparticipant observations are the two different forms of observation. These two 


types of observation do differ as indicated below. 

(i) Although both participant observation and nonparticipant observation are done in natural 
settings—in the former the observer or investigator actively participates in the activities of the 
group of persons being observed, whereas in the latter such active participation does not occur. 

(ii) Participant observation is usually unstructured, whereas nonparticipant observation is 
usually structured. Since participant observation is unstructured, the observer has a greater 
degree of flexibility in deciding what to observe and how to record it. 

i ne 7 participant observation, the identity of the observer is often hidden and he is treated 
inn = a member of the group; but in nonparticipant observation the observer is usually 
e persons being observed. Therefore, there is little chance of concealing his identity. 


This is how participant observation differs from nonparticipant observation. 


Thi SCALE 
ing classes of behaviour observation—observation of actual behaviour and 
Pilon aces AT Raninecique behaviour. In the former, actual behaviour is observed, that is, the 
with each be “ engaged in producing the behaviour remain physically present and interact 
are examples oe group of college students solving a problem and Ne teac pe-DUbn interactions 
chaviout. In rem e as behaviour and its observation is known as the observation of the actual 
G0 not rentain — ered behaviour, the persons or objects engaged in producing the behaviour 
Person may ie ysically present. However, they are symbolically represented. F or example, a 
-MiNating Su — to recall the scene of a classroom in which the teacher is extremely 
°F actual ah ch observation is called observation of remembered behaviour. The observation 
Orme, in a is easier than the observation ol remembered behaviour because in the 
What ic Stairs be ready-made and the observer has simply to make a decision on the basis of 
im, whereas in the latter, the observer has to take a decision on the basis of his 


> 





} MG lenCces 
286 Tests, Measurements and Research Methods in Behavioural Scier € 


previous experiences and/or on the basis of his ability to perceive oe an ob 
(because it is not present physically). This is why remembered behaviour is 


Perceived behaviour. Rating scale is a technique to assess both actual behav 
remembered behaviour. 


ject looks like 
also known as 
iour as wel as 


A rating scale is defined as a technique through which the observer or 
objects, events or persons on a continuum, represented by a series of conti 
Purpose of a rating scale is to know what kind of impressions the objects 
upon the raters. It is, therefore, essential that the 
those objects, events or persons. 
indirect or remembered ( 
nine or eleven points on 
with a descri 
follows: 


rater Categorizes the 
nuous Numerals, The 
or persons have made 
rater must have experience or knowledge of 
The experience may be direct (actual behaviour) or may be 
perceived behaviour). A rating scale usually has two, three, five, seven, 
a line with descriptive Categories at both the ends followed sometimes 
ptive category in the middle of the continuum, too. An illustration is Biven as 


Positive End: Middle: 


Negative End: 
Strongly agree Neutral 


strongly disagree 

The rating scale has two ¢ 
options. The stimulus variable ( 
Options consist of numerical or 
Person, rates the objects, even 
thereby, provides a quanti 


omponents, namely, the stimulus vy 
S) consists of trait names or qualities to 
descriptive Categories. The rater or obse 
's Or persons on the given scale accordi 
tative analysis of his observations. R 
YF concurrent. When they are retrospective, they tend to < 
by the raters regarding the ratees over 
they tend to summarize the 
interview, 


ariable and the response 
be rated and the response 
rver Who is often a trained 
ng to his impressions and 
atings may be either retrospective 
ummarize all the impressions gathered 
an extended period of time. But when they are concurrent, 


impressions that are gathered as jt happens in the case of an 


The rating scales are very easy , a large number of rating scales 
have been constructed. A problem ntly noticed in the use of rating scales is 
that the traits to be rated are not distinctively defined and may have different meanings for 
different rates. This defeats the very purpose of ratings. To solve this problem, three precautions 
should be taken in the construc 


tion of rating scales. First, each trait to be rated should be clearly 
defined and explained with specific instances, Second, various intervals or Points on the scale 
should be Clearly defined. Usually, five to seven intervals are used in rating a trait, attitude and 
other sentiments. When there is need for intervals freater than seven, there js likely to be a 
problem of finer refinements of distinctions and in this task, even the trained rater is not likely to 
succeed. Hence, the intervals on the rating scale should be ke 

traits like leadership, honesty, punctuality, “O-Operativeness, industriousness and the like are 
more reliably rated than covert traits such as €go-strength, job-satisfaction, emotional stability, 
etc., attempt should be made to ensure that rating scales should, as far as Possible, be concerned 
exClusively with objectively observable traits. 


to construct and use, As such 
which has been lreque 


, SINCE Overt 


TYPES OF RATING SCALES 


There are different types of rating scales, According to Guilford (4 954, 263), rating scales are 


divided into six Categories, namely, numerical scales, graphic scales, percentage rating, standard 
scales, scales of cumulated points and forced-choice scales. 


A detailed discussion of these types Is presented below. 





zw 


Nt 


Nt 


{a le 


wl 


on the 
description © 
investi 
i 
indifferent, + 
discriml 


Techniques of Observation and Data Gollection 287 


al Rating Scale 

1} rating scales are the easiest lo Construct and apply to the objects, persons, events, etc., 
| numerical scale the observer or the rater is supplied with a sequence of nurnbers, 
dined, and his task is to rate the objects on the given sequence of defined numbers 
his impression. Sometimes, it is found that numerical scales have only a 
fthe categories and no numbers are provided, After the rating by the observer, the 
ator assipns numerals lo certain categories e.g., 5 lo ‘Strongly agree’, 4 to ‘Agree’, 3 to | 
» 1 ‘Disagree’ and 1 to ‘Strongly disagree’, Sometimes a_ still more fine 
‘din such a scale. Examples of such types of numerical scales are given 


pmerte 
ere 
rated. Ina 
is well cle 


ich | ; 
basis Of 


ration IS need 


hele yw. 


with appropriate instructions and on the subsequent pages, a number of statements revealing 
various impressions, attitudes, etc., regarding the objects, persons and events are pointed. 
Opposite each statement is provided a blank space where the rater writes simply that number 
which he thinks to be the most appropriate one. A few examples of such items are given below. 


W 


d 


Scales with Numerical Anchors: 
merical Anchors; 





Numerical anchors | Meaning 
| Extremely disagree 

2 Strongly disagree 
3 Moderately disagree t 
4 Mildly disagree ha 
5 Indifferent 
6 Mildly agree | 
7 Moderately agree | 
8 Strongly agree 
9 Extremely agree 


a Fp ere ee 
Usually, the above numerical anchors along with their meanings are printed on the first page 


1. Nationalization of the private sector would make the 
country richer. 


2. Nationalization enables the government to make sound 
policies regarding the development of the country. 





Scales without Numerical Anchors: 
Strongly agree 
Agree 
Indifferent 
Disagree 
Strongly disagree 
en : r 
si si points are without numerical anchors, each statement is provided with all the 
Ppr p ve Cues where the rater puts a tick mark only on that cue which he thinks is the most 
Opriate, For example, 


Ty 
ee, and more countries should join the UNO. 
trongly agree Agree Indifferent Disagree Strongly disagree 


“7 * 


err i* 


— wonural Sciences 
Tests, Measurements and Research Methods in Behavioural Sci¢ 
288 Cats, stil 


’. Political leaders should obey the directions of the UNO. 
- Strongly agree Agree Indifferent Disagree 
Some ot the numerical scales are assi 
for the different categories towards the 
ends. An example is given below: 


Strongly disagree 
gned a rating of *0’ to the neutral point +3, +2 ang 4] 
positive ends and —3,-2 and —1 towards the Negative 


+3 Strongly agree 
+2 Mildly agree 

+1 Agree 

QO Indifferent 

—I Disagree 

-2 Mildly disagree 
-3 Strongly disagree 


first subtype 
n the numerals are Provided along with differ 
points, this may represent equal intervals in the mind of the rater and 
nearer to the interval measurements which, in turn, may facilitate s 
analysis. In the case of the second subtype, this advantage is not obvio 
Provided by the investigator after rating has bee 
third subtype of numerical scale is that its sc 
~1,.~2)-3,~4, andes on. Raters having a mat 
those who have no such knowledge are |j 


y the negative numbers. 
Moreover, such scales have a point, *0’, in the middle, which May suggest to some raters that the 
scale has a break in the middle and thus, May endanger the very Continuity of the scale. For these 
reasons, Guilford (4 954, 264) opine 


iS very common and 
ent categories of scale 
their ratings may come 
ome common Statistical 
us because numerals are 


s that the use of Negative rating numbers is not 
recommended. The general demerit of the Numerical rating scale is that the rater has to remember 
the meanings of the numerical anchors (Which are given only ont 
the numbers in the box. The most co 


he first Page) while he is writing 
mmon observation is that he forgets the meaning of the 
Ng number in the box. To avoid this difficulty, nowadays 
g designed in such a Way that on each Page the numerical 
re given, 


number and someti 
Most numerical rating scales are bein 
anchors along with their Meanings a 


Graphic Rating Scale 


The graphic rating scale is the most popular and Widely used rating scale. In one way, the graphic 
scale may be considered ac an improvement over the numerical scale because it tends to 
overcome some of the difficulties faced with the numerical scale. On the Braphic scale, the scales 
are presented graphically in which descriptive cues Corresponding to the different scale steps are 
Siven. Items or statements here have no blank box and the rater simply puts either a tick mark ora 
cross mark on any of the descriptive cues to indicate his view, An example of the gr 
presented below. 


(a) Mrs Shukla delivers 


aphic scale is 


lectures in the classroom: 





Ls, 


a See —— 
Extremely Tolerably With slow 


rapidly rapidly 


Sluggishly Extremely 
slowly 


speed 





ecPniques Of Observation and Data Collection, 289 


(yp) In sor jal ROSSIP, she: 


ikea Talks Casily Talks whe ” Sua ace 0 Le 
iF ’ | m Prefers to Abstains 
great dea absolutely listen from 
yy the above [Wo examples, Ihe raphy scales have ly 


| ‘Cn Nustrater, The 
justrates the scale pounts Ihrough a continuous line whereas the secone 
| i ] ! , 


points through broken tines. In both the examples, the scale 
na horizontal manner, They can, however, be de 
Graphic scales having scale points arranged horizon 
hey provide space for only the shorter descriptive SR a sa 
nutomatically overcome as the investigator can put as Many descriptive cues as possible (because 
there is no limitation of space). The Fels Behaviour Rating Seale is a pood example of the vertical 
graphic s¢ ale though it has some inc Neraphic scales, too. 


lirst exarnple 
| example illustrates the 
points have been demonstrated 
Monstrated ina vertical manner as well, 
ally have one important limitation, that is. 
Cues, In vertical scale points, this 


‘yl alt 


There are some advantages ot graphic scales, First. they have no numerical anchors. As such, 
the rater experiences no contusion arising out of the need for numerical discrimination. Second 
they are simple, easily administered and quickly completed by the raters. Regarding their 
disadvantages, it is said that graphic scales generally take time and labour in scoring. 
Percentage Rating 
Percentage rating is done whenever the Investig 
uniformity trom rater to rater, The lechnique requir 
specified percentage groups or into different perc 


ator wants a quick rating with maximum 
es the rater to place the ratees among different 
entiles or quartiles such as given below: 
Highest 5 per cent 

second highest 5 per cent 

Highest 25 per cent excluding the top 10 per cent 

Top hali but not the top 25 per cent 

Lower half 


Percentage ratings are common among teachers who are asked to rate their students in the 
oom in terms of overall performance. One of the serious limitations of percentage ratings is 


that the rater may be quite generous and therefore, the rating may be influenced by the individual 
diterences jn generosity among the raters. 


Clas 


Standard Scale 


Standard scales are not very popular rating scales for psychological measurement. Standard 
‘Cales Were 


Originally developed ina bid to have more uniformity of meaning oletare sateen and 
2 Bel an unbiased rating on the basis of thase scale points. A standard scale is one in which the 
“Ets presented with some standards with pre-established scale values. These standards ey 
M4. tof objects of the same kind, e.g., they may all be the names ot persons. ms aavEName i : 
“an-to-Man Scale and Portrait Matching, which are based upon the principles of the standarc 
“HC, ate given below. | 
The Man-To-man Scale was developed during World Wat | and it used men ssn ol 
Numbers and adjec ects other das riptive Cues lo represent the various scale points ~~ i 
ins is asked to give the name of a person who is well known to hin = one ie ee ow ne 
SC "ing rated. That person's name is then noted down to define the ses | sha ee seia 
vii, Kewise, the rater may then be asked to give the names i eating Fated All these 
name Rs itn and who are high, average, low, and —. sa ST a aiiek wie low, Thus 
thew TE entered 16 define the four scale points— ‘high’, ‘average . 


, ery low—is complete. The 
dif L : , ear. Ties low mic Very low Is od 
With its five scale points—very high, high, average, ; 


Cons) 


7 —a 


2900 = Tests, Measurements and Research Methods in Behavioural Sciences 


rater is then given only those key names with pre-establ ished fives; orem Seale —— 7 rate other 
persons. Here the rater’s task is very simple because he is required to conipare t e rr | wri or 
persons (to be rated) with the five key persons (defining the five levels ce any ont a 
in question. The value of rating corresponds to the value of that key-person “ mi | : ae ales, 
For example, if a person is being rated on the trait of dominance and il the rater assigns im the 
name ‘Col. Shamsher’, the value of the rater on the 5-point scale will fall in high’ on the trait 
of dominance. A hypothetical example to illustrate the Man-To-Man Scale is given below, 


ae 


Scale Values Name 
Very high Lt Chima 
High Col. Shamsher 
Average Brig. Kariappa 
Low Lt Bodra 
Very low Lt Gilani 





There are several advantages of the Man-To-Man Scale. First, such a scale avoids the 
confusion arising out of abstract numerical anchors assigned to the traits to be rated. Second, if all 
the raters use the same key-man in their ratings, their ratings can be comparable both in absolute 
terms as well as in relative terms. Third, since the scale values of all the key-men are 
pre-established and fixed, raters can’t shift over day-to-day’s ratings. Three principal 
disadvantages of the Man-To-Man Scale have also been suggested. First, the distance between the 
key-men of the scale is not equal. Second, actual practice has demonstrated that no two raters are 
alike in rating persons who are well known to them and this naturally endangers the second 
advantage discussed above. Third, deliberate overestimation and underestimation of persons by 
the raters are not controlled by the scale. 

Portrait matching is another rating scale which is based upon the principle of standard scale. 
The portrait matching technique was developed by Hartshrone & May in 1929 in the course of 
their study on character. In this technique, a set of standards (or verbal sketches or portraits) for 
any given trait on which rating is to be done, is prepared. For constructing the verbal portrait 
regarding a given trait, a large number of verbal statements describing that trait are collected and 
each is written on a separate card. Subsequently, they are rank-ordered by a group of judges or 
experts. After that, a desired number of sketches or portraits are prepared on the basis of those 
statements, which have about the same average rank. The scale value of each portrait is also 
determined. For determining the scale value, the portraits are given to another group of judges for 
ranking them and the mean rank becomes the scale value. In this technique the rater is given all 
the verbal portraits to read (which consist of several synthesized statements regarding a particular 
trait) and then, he names the persons who belong to the portraits. The chances are that within a 
single portrait, several persons and a single person may be placed under more than one portrait. 
A person’s final rating is the average of all the portrait values that have been assigned to him by 
all the raters. The merit of portrait matching is that it eliminates the element of subjectivity, which 
is common in the Man-To-Man Scale but the most serious demerit is that it does not provide 
objective and realistic standard scale points. The portrait, matching technique deviates from the 
usual rating method in that it requires the rater to find persons to match the description (ot 
portrait), which is in contrast to the usual practice of asking the rater to find descriptions to match 
the persons. This automatically means that all names of the individuals to be rated should be 
placed before the raters and they should not be allowed to depend upon their ability to recall the 
names of individuals. This is certainly an arduous, if not impossible, task for the investigator. 


Techniques of Observation and Data Collection 291 
le of Cumulated Points 
scale ‘ ales based upon cumulated or summated points are the most common. Here the 
sc 


Rating rotal score is the sum of individual ratings or points assi gned to all 


‘ | items of the scale, 
eer may be weighted or unweighted. At a glance such rating scales may seem identical 
such P . 


| -ychological tests (where a person's total score is the sum of the individual scores on 
with the eile test) but in reality, they differ in that psychological test items are answered by the 
the ter rave Wes whereas the rating scales items, which are based upon summated ratings, are 
a by another person, that is, by the rater or observer, Rating scales based upon summated 
oil be conveniently classified into two types, namely, the Checklist and the Guess-who 
pints C2 
technique. | | | | | 
The Checklist method is one where the rater is supplied with a large number of specific 
behavioural statements (which serve as stimuli) and he is asked to check these statements, which 
describe the persons in question. The person (the ratee) is then characterized by only these 
statements which have been checked for him. Such a rating scale is known as a behaviour 
checklist. Subsequently, these assigned statements are scored. For example one very convenient 
wav of scoring the checked statements is to score each favourable statement as +1, each 
unfavourable statement as —1, and each neutral item as 0. The person’s total score on the checklist 
is equal to the sum of the scores for the items checked for him. The purpose of the behaviour 
checklist is to know whether certain specified traits or behaviours are present or absent in the 
individual being rated. A good example of the behaviour checklist is the Vineland Social Maturity 
Scale, which consists of items relating to self-direction, self-help, communication, socialization, 
etc. The checklist is filled in by the rater who knows well the children being rated. 


There are several variations of behaviour checklists. The items of the checklist may be in the 
form of two-point responses (Yes—No, True—False) or in the multiple-choice form. The sample 
illustrating these two varieties is given below. 


Sample item of a two-point response: 


The picture was interesting Yes No 
or 
The picture was boring Yes No 


Sample item of a multiple-choice response: 
The girl was 








- smart 
———-- intelligent 
- beautiful 

- ugly 

- submissive 











(the bi Benn sample item contains only the adjectives as the response option and the stimulus i 
the a tO be rated on those adjectives. Such a behaviour checklist is, therefore, also known as 
atin — checklist. On either of the above two types of behaviour checklists, each positive 
mae a tae = sCOle of +1 and each negative rating is given a score of —1. In order to geta total 
Cores Ni ~ Positive scores are summed together and from them the sum of all the negative 
Ne rater’ 7 together, is subtracted. Thus the resultant total score represents the positiveness of 

5 altitude. 

an "fair evaluat 

__ attractive pro 

hens of the 

dler who sg 


ion of the behaviour checklist it can be said that despite its simple technique 
cedure, it has not been a very widely adopted method. One of the serious 
checklist is that the rating is influenced by various kinds of response biases of 
mply asked to check the items in an all-or-none fashion (Guilford 1954). 





ras 


Sean fesearch Methods in Behavioural Sciences 
292. Tests, Measurements and Researe hb Methods in Be 


The Guess-who technique, also known as Casting Characters, is another type of rating SCale 
where the total score is based upon the cumulated points. The technique was developed 
Hartshrone & May in 1929 to be used primarily with children. The technique CONSIStS Of Verbal 
descriptions (or verbal portraits) of the various roles played by children in a group. The vy 


erbal 
descriptions are usually in the form of one or two sentences. The child raters are asked to Name 
the other children (or their peers) who fit or match certain verbal descriptions, Mentioning the 


same child as many times as they think appropriate. Each favourable description is given q point 


and all such points are summed to get a total score. A few sample items of this technique are 
given below. 


Here is one who is always worried. 

Here is one who is always happy. 

Here is one who is always discouraging others. 
Here is one who is always helping others. 
Here is one who never likes to do a task. 


The dangers involved in using the Guess-who technique are similar to those of the 
portrait-matching method. The technique obviously deviates from the normal practice of asking 
the rater to choose the verbal description which matches the individual (because here the rater is 
to find the individual to match the verbal description). This technique, thus, obviously requires 
that the rater be supplied with names of all the individuals (or children) to be rated, and th 
should not be left to depend upon his ability to memorize the names of the individu 
almost impossible task for the investigator. 


e rater 
als. This js an 


Forced-choice Rating Scale 


A forced-choice scale represents a major departure from the 
above. All the above techniques consider one attribute al 
set of categories. In the forced-choice rating scale, the r 
verbal statements for a single item and he decide 
being rated most appropriately and accurately, 
several alternatives—two, three, four or five. 
alternatives are the most common. In the two-alternative form, both the statements regarding the 
attribute are either favourable or unfavourable. However, only one of the statements regarding the 
attribute in either case is valid to identify desirable or undesirable attributes, though both of them 
may appear equally favourable or unfavourable to the rater. In the four-alternative forms, two 


varieties are Common—in one variety all the four statements are desirable or undesirable, and in 


the second variety two statements are favourable and two statements are unfavourable. When al 


the four statements are favourable or unfavourable, it does not mean that all of them have equal 
discriminative value, or in other words, that they are equally valid in identifying desirable fa 
undesirable attributes though they may appear to be so to the rater. This automatically reduces 
the probability of operating a favourable response bias by the rater. The rater may be asked l0 
select any two statements which are most descriptive and representative of the persons being 
rated. An example of the four-alternative forced-choice rating scale is given below. 


pattern of rating scales discussed 
atime and place the ratees in any one ofa 
ater is given a set of attributes in terms of 
s which one or ones represent the individual 
The items of the forced-choice scale may have 
Of these, the two alternatives and the four 


Miss George 
(a) lectures with confidence 
(b) keeps the students interested and motivated 
(c) cares a great deal for the slow learner 
(d) entertains suggestions from students to improve her lecture 
In the above four statements (all favourable), the most discri minating statement is (c) and _ 
least discriminating statement is (a). The most discriminating statement may be assigned a pee 
of 2 and the least discriminating statement a score of 0. A ratee’s total score would be the su™ 


uf 





lechniques of Observation and Data Collection 293 


euch scores assigned to the most discriminating statements in each set. The students must note 
-aretully that the assignment an score value to the most and least discriminating statement is not 
random and/or dependent upon the will of the investigator. As a matter of fact, it is determined 
on the basis of some preliminary investigations. A Comparative study of the forced-choice 
ique having all favourable statements and having two favourable and two unfavourable 
Las revealed that the former is good for two reasons. First, the rater fully co-operates 
with such a form and second, the probability of faking or distortions is low whereas the same 
robability 's high in the case of the latter type of form, that is, having two favourable and two 
unfavourable statements (Freeman 1962). 
There are several advantages and disadvantages of the forced-choice rating scale. Its main 


jdvantages are given below. 
1. The forced-choice rating scale minimizes the generosity (or kindliness) error. Since the 


tems of the scale are equally favourable (or unfavourable), a rater who may be very kind and 
s not given any opportunity to choose one rather than other and thus, his 


generous tO the ratees, | 
tendency to rate generously is automatically controlled. 


7. The tendency on the part of the rater to be influenced by a single favourable or 
unfavourable trait, which subsequently colours his whole judgement, is also automatically 
controlled. If the rater wilfully wants to bias the score in any one direction (either downward or 
he is least allowed to do so because in the forced-choice scale, the rater Is usually 
e favourable choice or the unfavourable choice (because the options appear 
ble or unfavourable and therefore, he is unable to rate a person up OF down). 

3 Whenall the statements of the forced-choice scale are equally attractive, the rater’s bias 
ig automatically reduced and this has a very marked effect upon the distribution of ratings, that is, 
the distribution of the ratings becomes symmetrical and normal, 
ed-choice rating scale are given below. 

t to do ratings on the forced-choice scale because this 
imposes too many restrictions upon them. 

ally describes the person 
h judgement, then, 


rechn 
statements 


upward) 
unable to identify th 
to be equally favoura 


The main disadvantages of the torc 

1. Raters are critical and hesitan 
gives them limited freedom in ratings and 

7. Sometimes, the raters believe that none of the statements actu 
appropriately even though they have to give their judgement. Obviously, suc 
would not be representative of the raters true decision. 

Although the forced-choice scale has acquired much prominence in past years, its 
importance today has been considerably reduced partly due to the rater’s hesitant and critical 


attitudes and partly due to the difficulties in volved in constructing the scale. 


OTHER SPECIAL TYPES OF RATING SCALES 

Under this heading, some special types of rating scales, which have recently acquired 
prominence in the field of measurement will be discussed. The Q-sort, the semantic differential 
scale, the behaviourally anchor rating scale, sociometry, etc., are a few examples. A brief 


discussion of these special types of rating scales is presented below. 





Q-sort Technique 
The Q-methodology was originally devised by William Stephenson in 1953 for the study of 
verbalized attitudes, preferences, self-description, etc. The procedure of the Q-methodology is 
known as the Q-sort, which involves a comparative rating method rather than an absolute rating 
method, In fact, it is a method of ranking attitude or judgemen's and is particularly effective when 
the number of items to be ranked is large. In the Q-sort technique some objects such as verbal 
Statements, pictures, words, phrases, etc., are given fo an ‘ndividual or a group of individuals to 
‘Ort Out into a set of piles (usually nine or eleven piles) according to a fixed criterion. A limited 














294 = Tests, Measurements and Research Methods in Behavioural Sciences 


number of piles is thus an example of forced distribution of sorting. Each object (whether itis 
verbal statement or a picture or a word, ete.) is printed on a separate card. For statistical facility 
and reliability, the number of cards in any Q-sort studies should not be less than 60 and mor, 
than 140 (Kerlinger 1973). The sorter(s) is (are) instructed to sort a fixed number of cards inte 
each pile so that the resulting whole distribution may approach normal or quast-norma| 
distribution. The purpose of approaching a normal d istribution of cards, among the different piles 
's to facilitate statistical calculation. As Nunnally (1970, 449) has said, “The use of an 
approximately normal distribution rather than some other fixed distribution (e.g., a rectangular 
distribution) is justified in the general sense because (1) so many things in nature are distributed 


approximately that way, and (2) it fits in with the statistical methods applied to the data.” One of 
the items or contents should be 


the basic principles of Q-sort is that as far as practicable, ; 
homogeneous. This is because in Q-sort, the investigator seeks precise comparative responses 
elicited by a large number of stimuli and if all these stimuli are not homogeneous, the 


comparative responses would make no sense. Two examples of the fixed distribution (a kind of 
normal distribution) are cited below to illustrate the points. 
Q-sort distribution of 100 items (pictures, drawings, words, etc.) 


2 4 5 12 14 20 14 
10 9g 8 Z 6 5 3 2 | @) 
Most Least 
' preferred 


preferred . 


Q-sort distribution of 90 items (verbal statements) 


3 if 10 13 16 
a OOD 
4 3 2 | O 





10 9 & 7 6 5 
Most Least 
preferred | ; pre ferred | 


From both the preceding illustrations, it is obvious that Q-sort requires a rank-ordered 
continuum ranging from the ‘Most’ to ‘Least’ dimension with varying degrees of subdimensions 
‘1 between these two extremes. The numbers above the line indicate the number of cards to be 
sorted in each pile. The numbers below the line indicate the total number of piles into which the 
objects are sorted and the values to be assigned to each card placed into different piles. For 
example, 10’ indicates the ‘most preferred’ picture, and 2 pictures (the number above the line) 
are to be sorted in this pile and both of these pictures would be given the value of 10 and 10. 
Similarly, 9 indicates the next pile and 4 pictures are to be sorted into this category, each 
receiving the value of 9. The centre pile (5th category) is a neutral pile and the sorter is instructed 
to sort all those objects, which are either vague or difficult to be sorted into any one pile, into this 
neutral pile. 

When sorting is over, statistical analysis is done. Two statistical analyses are common: the 
analysis of variance and the coefficient of correlation are computed between the sorting of the 
different sorters. The Q-data can be analyzed for one individual only or the data for the group of 
individuals can be analyzed. The computation of the coefficient of correlation has been 
illustrated from hypothetical data presented in Table 12.1 (10 pictures sorted by 5 persons into 


11 piles). 


lechniques of Observation and Data Collection 295 


since there are five sorters, there would be 
MN-1) 5x4 | 
ay =~ >— = 10 pairs of sorters 
2 2 
d accordingly, there would be 10 coefficients of correlation. Table 12.2 presents the 
ane: on ; . | | 
: relation matrix (value of Pearson r) prepared on the basis of Table L244; 
correle 


Table 12.1 Q-sort values of five sorters 


= Items* | | | Sorters | 
A B @ E 


; D i, 
4 4 8 7 1 
2 2 2 5 5 2 
23 5 4 9 8 + 
4 () 0 10 10 0 
5 4 3 1] 10 4 
6 6 5 0 10 5 
7 3 S 1] 1 3 
8 3 3 10 10 3 
9 4 3 10 10 3 
10 3 4 9 10 4 





“In practice, the number of items to be sorted is far greater than 10. 


Table 12.2 Correlation matrix from the Q-sort values of 5 sorters presented in Table 12.2 


ae B Ee D E 
ee ee a 





+ 0.932 — 0.343 0.264 0.939 
B 0.932 + ~ 0.363 0,292 0.978 
C ~ 0.343 ~0.363 + 0.329 ~ 0.293 
D 0.264 0.292 0.329 + 0.318 
E 





0.939 0.978 = 0.293 pie 
From Table 12.2, 
Can be read either on 
efficients of corre] 






it is clear that the correlation matrix is diagonally symmetrical and hence 
the basis of the upper diagonal or on the basis of the lower diagonal. Of 10 
moderate | ation, three coefficients are very high and the remaining are below the 
0.978, C level. The high coefficients are 0.932 (between A and B): 0.939 (between A and E); and 
came peony B and E), this means that three sorters, namely, A,B and E, Sort the pictures in the 
be. — OW, to know their exact way of sorting, the investigator Is again required to go back 
Ort data and examine whether they preferred lower piles or the upper piles. 
Erlinger (] 973) has divided Q-sorts into the following two principal categories, 
Yastructired Q.sorte 


" UNstruc ' , ; 

1979) uctured Q-sort is based upon the random sampling of content o7 materials (Nunnally 

NVestipatg: -wndom sampling of content is different from the usual technique. Here the 
Bator Selects a wide variety of representative contents or materials (for sorting) from 











296 Tests, Measurements and Research Methods in Behavioural Sciences 


different sources such as photographs of the different statues collected rom different book, 
magazines, etc., different statements about psychotherapeutics progress in clinics, statemeny, 
about mental illness, etc. No theory, variable or specific factor is assumed to underlie those 
contents. An attempt, however, is made to keep all the contents or items for any one Osor 
homogeneous as far as practicable. For example, all the different photographs and/or drawin 
may be placed together (excluding other contents like statements, words, phrases) for Sorting 
Thus, in an unstructured Q-sort, the items or contents are selected to represent one domain and 
no specific theory or variable or factor is said to underlie those items. Moreover, such items are 
not partitioned further into Q-sort analysis. The principal analysis focuses on the Correlations 


among persons on factor or cluster analysis. 


Structured Q-sorts 

A structured Q-sort is one in which all items, like in an unstructured Q-sort, belong toa single 

domain. However, these items are said to constitute a variable or hypothesis and during analysis 

they can be divided into one or more than one method. These two characteristics make the 

structured Q-sort different from the unstructured Q-sort. In a structured Q-sort, the investigator 

stipulates the contents or items in terms of an experimental design. When the design is such that it 

permits the partition of items on the basis of a one-variable classification, it is referred to asa 

one-way structured (Q-sort, and when the design is such that it permits the partition of items on 

the basis of a two-variable classification, it is referred to as a two-way structured Q-sort, An 

illustration of a one-way structured Q-sort is as follows: Suppose the investigator decides to make 
a Q-sort study of attitudes towards nationalization. He might select some statements (say 60 
statements), revealing attitude towards nationalization. Subsequently, two categories of 
individuals—one category consisting of people from the private sector and another category 
consisting of people from the public sector—may be selected to act as sorters. If the popular 
belief that nationalization leads to overall improvement is true, the people of the public sector are 
expected to place the favourable statements (towards nationalization) high and the unfavourable 
statements low while sorting. The opposite expectation is due from people of the private sector. 
This type of study illustrates a one-way structured Q-sort because there is one basis of 
classification. To illustrate a {two-way structured Q-sort, two variables—attitude and 
specificity—might be investigated. An attitude item may indicate either a liberal attitude or a 
nonliberal attitude, and at the same time it may be either a specilic or general item. Some liberals 
might sort specific liberal items as high, whereas other liberals might sort general liberal items as 
high and so also the nonliberals. A schematic representation of a 2 x2 factorial design thus 
constituted has been shown in Table 12.3. 


Table 12.3 A two-way structured Q-sort with the number of items in each cell 


Specificity 


——————————— 


Specific General 
Liberals Pas £9 
ne 
Nonliberals 25 25 


The investigator might select 25 items which are both liberal and specific, 25 that are both 
liberal and general, and so on, for the bottom two cells of a 2 x2 table. The above example 
illustrates the two-way structured Q-sort. Usually, the two-way structured Q-sort is very helpful in 
formulating a certain theory because the factorial design permits two variables to be related in a 
logical and empirical way (which is the essence of any theory). 


wz 





fecbniques of Observation and Data Collection 297 


the (Q-methodology is controversial. As such, some have praised it and some have criticized 
some of its merits on the basis of which the technique has been commended are as given 
5 


a|OW: 
- 1. The chief merit of Q-sort Is that it permits the subjects to make comparative responses 
rowards qa large number of stir. Ordinarily, it has been found that the reliability (usually the 
retest reliability) ol the comparative responses obtained from Q-sort is higher than the reliability 
of the comparative analysis obtained from other absolute ratings. This is one of the important 
advantages OF Q-S0Kt. 


9, Q-sort, particularly the two-way structured Q-sort, is very helpful in formulating a 
theory for testing the principles of theory because it allows the manipulation of variables in a 
logical and empirical way. 

3. Q-sort data allow the operation of correlational methods as well as the analysis of 
variance. As such, the data facilitate statistical computation as well. 


4. Since Q-sort provides a challenging and realistic task, the technique appears to be 
interesting to Many. 

The primary limitations of Q-sort are as given below. 

1. Qsort assumes without any reasonable proof that a normal distribution (or a 
distribution approaching normal) best fits the rating of each sorter. 


2. Q-sort only provides an estimate of comparative ratings for each trait. It does not say 
anything about the strength of each trait within the individual. 


3, Q-sort does not provide scope for the inclusion of a cross sectional or larger sample. 
Moreover, in this technique the investigator least cares for drawing a random sample of persons. 
Due to the above two reasons, the conclusion of Q-sort can be generalized to a larger population 
only with great risk. 


4. Q-sort allows a forced-choice sorting which has several adverse repercussions. Many 
individuals feel irritated with the forced-choice ratings as it does not allow a free play of attitude, 
and at the same time requires them to conform to an unnatural, unrealistic and unreasonable 
requirement. In forced-choice Q-sorting, when the coefficient of correlation is computed (which 
is usually done), the information regarding the elevation (mean) and the scatter (standard 
deviation) of the set of scores is lost. Not only this, the forced-choice technique affects the sorting 
of the Q-cards, that is, the placement of one Q-card affects the placement of another Q-card. This 
automatically means that the Q-sort, being a forced-choice technique, violates the assumption of 
independence which underlies most statistical tests. 

Despite these weaknesses, the Q-sort is a very useful research technique for problems, 
which require comparative reactions towards a larger set of stimuli. 


Semantic Differential Scale 


Me semantic differential (SD) scale is an attitude measuring device developed by Osgood et 
: 1957), The scale grew out of the studies of Osgood and his colleagues on the mediational 
cit of learning where the ‘meaning’ of stimulus is one of the central factors. While ey were 
mie the structure of the domain of meaning represented by adjectives, a in a ri 
iS one ar which could measure the various facets of meaning. The a ps ~ . e 
pond stTument. The semantic differential scale is similar to the Likert scale in that the 
Ent indicates an attitude or opinion between two extreme choices. 
| ea SD scale may be defined as a collection of subscales (each anchored 2 — a by 
| ie Sjectives in which absolute ratings of concepts are done. ine ae - e i —— 
he n the sense that it does not permit a comparative rating ol ReSpOnSS whereas a one 
scale, the term ‘concept’ refers to the object which is to be rated. The purpose of SD Is to 


A, 














298 = Tesis, Measurements and Research Methods in Beha woural Sciences 


measure the various facets of meaning of the concept, the various facets of Meaning bg: 
represented by adjectives. The reason why adjectives are selected to convey the Meaning “< i 
concept is that in our day-to-day language, most of our ideas are appropriately communica, 
only through adjectives. = 
For understanding the SD scale, it is essential to know the meaning of the term ‘Meanino 
Meaning is a very general term and it includes the various reactions of people towards an obje: | 
There are three facets of meaning—denotation, connotation and association. Denotation refer 
to description of object(s) in terms of their physical properties. For example, an elephant may by 
described as being a big fat animal, mostly blackish, having a long trunk in the front, et¢ 
Connotation indicates the sentiments and feelings of persons about an object. It thus indicates th, 
general implications that the object has for the person. The person may describe an elephant as; 
very dangerous animal’. Association refers to those objects which come to the mind of the person 
when he has heard or seen a particular word. Thus, a person might think of words like ‘wild’ 
‘jungle’, ‘tiger’, ‘lion’, etc., after listening to the word ‘elephant’. The semantic differential scale . 
a measure of mainly the connotative facet of meaning. Very few and limited evidences are 
available to show that even a denotative facet of meaning can be employed in a semantic 


differential scale. 

In the semantic differential scale, the concept is usually rated on the seven-point scale 
having bipolar adjectives at the two extremes. Some have used a nine-point scale but a 
seven-point scale is more common and effective. For children, a five-point scale is sufficient. The 
first step in constructing the SD scale is to choose the concept. In doing so, two primary 
considerations are important. First, the concept must be that object or stimulus which can elicit 
different responses from individuals. If the concept is such that all people respond in the same 
way (that the variance in response among persons is zero or nearly so), the concept is not worth 
investigating. Second, the concept must be relevant to the problem being investigated. For 
studying the political atmosphere of a country, the choice of concepts like ‘Student’, ‘Teacher’, 
etc., would be examples of irrelevant concepts. However, for studying the educational 
environment these concepts would be most relevant. 

When the concept has been selected, the next step is to choose the scales or adjective pairs. 
Again, there are two important considerations to be kept in view. First, it must be decided by the 
investigator as to what factor(s) should represent the adjective pairs. Osgood et al. (1957) have, 
on the basis of their factor analytic studies, discovered three kinds of factors, namely, Evaluation 
(E), Potency (P) and Activity (A) for the purpose. These factors are called cluster of adjectives and 
represent three dimensions of meaning along which a concept can be measured. These 
dimensions are technically known as semantic space. E factor is defined by pairs of adjectives 
like Good—Bad, Fair—Unfair, Clean—Dirty, Honest—Dishonest, and so on; P factor is defined by 
pairs of adjectives like Strong—Weak, Large—Small, Hard—Soft, Dominant-Submissive, and so on; 
and A factor is defined by pairs of adjectives like Hot-—Cold, Active—Passive, Tense—Relaxed, 
Quick-Slow, and so on. Oj these three factors, the E factor is the strongest because the pairs ol 
adjectives of this factor have sharp bipolar extremes, that is, all pairs have very clear-cut positive 
and negative extremes. The investigator may wish to have scales of E factor or he may have scales 
of any two factors or include all the three factors in preparing the SD scales. In the study of 
attitude or value, it is wise to include only the scales of E factor. Besides these three factors, 
sometimes the investigator may wish to include scales of other factors like Tautness, Warmth, 
Familiarity, etc., but these are very uncommon factors. Second, the pairs of adjectives must be 
relevant to the concept being rated. For the concept ‘War’, pairs like Honest-Dishonest, 


Techniques of Observation and Data Collection 299 


cour, and so on would be irrelevant pairs but pairs like Heavy-Light, Quick-Slow, 
sjoatile-Amiable and the like would be relevant pairs. 
! in the final preparation of the SD scale, each concept is written on a separate sheet of paper 
yerably in the top-middle of the bipolar adjectives) with the same set of scales and the subject 
ies to rate the concepts as he (or she) sees them. An example of the SD scale with ‘Teacher’ 
pe concept is given in Table 12.4. 
Table 12.4 Asample of semantic-differential scale 


TEACHER | 

1. Original — Ft LL? ___ + ‘Conventional (P) 
2. Reserved po ey oY BW 5; — 3 ___ ? |Datspoken (E) 
"3. Passive a a ee ee ee eee (A) 
"4 Slow ee SL ae (A) 
5. Good a) nd} =f), SY  -_ 'Bag (E) 
*6. Shallow a a ee ee ee ee): (P) 
7. Fair 8. FGCU Lt Ure (E) 
*8. Dull a en a ee ee en ee Sharp (A) 
“9. Warm a ee eee eee ee ee ee (P) 
10. Systematic _ tft Ett __:: Epratic (P) 
“11. Worthless woe ee YS CCE Vitel (E) 
12. Successful | ai OE et (Uinticcesshy| (E) 





“Meaning given in the text. 


thang 12.4, asample of (2 scales of the concept ‘Teacher’ has been shown. Items bearing 

seas a cate that the pairs of adjectives have been reversed at random in order to reduce 

fits ponse biases like thoughtlessness and stereotyped reading. The letters in the brackets 
right indicate the factor to which the pairs belong. 


7 wee data, SD data can be analyzed for one individual only as well as for a group of 

eileen - = es on the individual scales are first located and then summed up to tind out 
Wo) means are aer'Gh BEOIGS, Subsequently, the means are compared. If the two (or more than 
individualls):if the 1, similar, it is indicated, then, that the two concepts are alike in Meaning for 
neaNS) are sit Ps means vary, it is indicated that the two concepts (which yielded those two 

€ Use of a « . in semantic space. In such a case Osgood et al. (1957) have recommended 
bet Ween Pecial statistic called the D statistic, which ts a measure of the linear distance 


an) = : 
cher hed by y two Concepts. On the SD scale, a score means simply the assignment of the numeral 
~ BY the subject. | 


Fair 7 - 
For 7638 a 3:2:1 Unfair 
* €xample. | | 
dot ON the fied, tas the person checks a concept on a pair of tair—untair, say, in the third Sel Of 
N SCOrin g all th 2 score on this scale of fair—-unfair would be 3. Similar procedures are adapted 
The ce © scales of the semantic differential. 
r Mantic diff 


5 nT Stwenian erential scale has some advantages and disadvantages. The scale provides 
NCE the tgs and quick way of gathering impressions on one or more than one concept 

a a e , ¥ : ane = A : 
*ummed over the different scales, they tend to average out the peculiarities, if 





yaa 


200 Tests, Measurements and Research Methods in Behavioural Sciences 


any, among the scales as well as provide a basis for finer discriminations among the individ 

The most common disadvantage of the semantic differential scale is that the appropridtenes 
pairs of adjectives is questionable and little consensus exists among the experts regardin , 
suitability of the pairs selected. It is also said that often the responses given by the subjects c a 
superficial and verbal level. For example, a person might value the concept ‘Teacher’ at high by 
might not really be attaching a higher value or meaning to the teacher. This criticism is, bouinae 
not unique to the SD scale; rather, itis common to all attitude-assessment techniques, | 


Behaviourally Anchored Rating Scale 

These rating scales are relatively new ones and through them, the performance evaluation in 
job or a profession is measured. The development of the scale is a lengthy process and is done jn 
three stages. The first stage is one where a group of experts, usually through discussion, decides 
upon the possible dimensions, which can be used as the aspects of the performance being 
evaluated. Suppose, for example, the academic performance of a college is to be evaluated, and 
a group of experts consisting of students and teachers decides the following dimensions to be 


important and appropriate ones: 
1. Sincerity of students 
2. Observational ability of the teachers 
3. Organizational ability of the teachers 


4. Appropriate syllabus 

Several statements regarding each dimension are prepared and this marks the beginning of 
the second stage. One example of the critical incident (or statement) of the dimension of 
observational ability is: when most of the pupils do not understand the lecture, the teachers 
should immediately sense this and try to find out where the defect lies. In this way, severa: 
statements regarding the different dimensions are prepared. Thus, the critical incidents are 
prepared in the form of small events to represent the superior, average and inferior performance 
of each dimension. Usually, the critical incidents are prepared by the experts. Subsequently, 2 
group of raters is assigned to sort each critical incident to the different dimensions to which 
belonged. On the basis of perfect or near-perfect agreement the incidents are finally selected ane 
the incidents showing least agreement are discarded. The third stage consists of calculating the 
scale value and consistency of each critical incident. For this purpose, statements are given toa 
group of judges who rate all these statements on a scale of excellence. The median value of rating 
of each statement is calculated. The median value becomes the scale value for each statement. 
Finally, only those statements are retained that display higher agreement among judges 
(indicating a higher degree of consistency) and that yield a wide range of average scale values. In 
this way, the scale is ready. 

Behaviourally, anchored rating scales have some advantages and disadvantages. Campbell 
et al. (1973) have shown that the behaviourally anchored rating scales, as compared to other 
rating scales, yield less halo error, less leniency error and fewer variances attributable to the 
methodology. The common disadvantages are that the procedure of the construction ot the scale 
is time-consuming and_ lengthy, and the scale loses the trait of simplicity due to Is 


methodological sophistications. 


Nominating Technique—Sociometry 
A nominating technique is a technique in which each person names OF nominates other person 
events, objects, subject matters, which are perceived as fitting into certain categories OF 
‘tuations. The technique is commonly applied for studying social choices and rejections 
ociometry is one type of a nominating technique which ts used in the study of group structure 
social status and personality traits. The basic principles of soclometry were first enunciated by 





ij ce niguies of Observation and Data Collection 301 
4 avolume entitled Who Shall Survive? , Published jn 
asa method ol discovering and evaluating group 
ve through measuring the acceptance or rejection betw 
ma we of evaluating interpersonal relationship ina BfOup. Stanley & Hopki 

eT fini sociometry a8 “the study of INterrelationship among i is ise ria a3) 
a structure: how each individual is perceived by the group.” With the hel ao — a : : 
chnicue the data relating to the choice, comm “e “ ometic 
(EL rm 


| unication and interaction Jatterns of indivi 

e Pallie : r rs) . ‘ms of ind 
in groups are athered anc analyzed (Kerlinger 197° . 1986) f 1 individuals 
in Groh /3 


in studying the structure of the group throu 
equired to make one, two or three choices ( 
equi 


; i , for other persons for a 
specified purpose. For example, ina group of schoolchildren each child may be asked with 
whom he/she would like to sit or go to the circus or play, or who are the three best (or worst) 
pupils in class? In a factory, a worker may be asked to nam 


: Maas | e the three individuals in order who 
command the maximum prestige in their Broup. Or, a worker may be asked to na me two persons 


who have certain specified traits like restless—quiet, humour—-humourless, and so on. An item in 
each such pair designating the favourable trait IS given a positive score (say +1) and an item 
displaying the unfavourable trait is given a negative score ( say —1). A person’s score on each trait 
isthe algebraic sum of positive and negative scores given by his peers in the group. 
Data obtained on the basis of sociometric tests are usually analyzed by three principal 
methods: sociometric matrix, sociogram and sociometric index. Of these three, the first two are 
very popular and hence, will be given a wider coverage than the last one. 
Sociometric matrix, also known as sociomatrix, is a simple cross-tabulation or rectangular 
array of x n dimensions, n being equal to the number of individuals in the group. The meaning 
of the matrix can be illustrated through an example: Suppose a group of eleven students (A, B, C, 


D,E,F, G,H, |, J and K) in a class was puta sociometric question: “With which two members of 


this group would you like to go for a picnic?” If a member chooses the other member, it is 
displayed by 1: if he does not make a choice, it is shown through 0. The data (hypothetical) are 
Presented in Table 12.5. 


) 1934, Sociometry may be broadly 
4eline “ucture, social status and personality 


en individuals in a Broup. Thus it is a 


gh a sociometric test, each 


individual ina group is 
sometimes, larger than three) 


Table 12.5 Sociometric matrix of an 11-member group with two choices (11 x11 matrix) 












CHOSEN 
wm BD € DP FE FF @ A | J SK 
Jaaguanacead 
BLololo 
Clololoa. 1fo}ololo}o 
DLo}i1]o- | 0 | 2 | 
rofo fo 
Sad aeec eee 
c foto. Toto fofote 
Hie 4 | fofojolo|o 
ito Tole loli lt. 
Ar Cpe pe 
Klolo. pi fifo 
Sim OF 41 2 21 0 2 2 2 2 
tis “Onvenj aft to right row-wise. The first row can be read 
SA Where ent to read the above matrix from left to rig ee otter 
maze oe does not choose C, F, G, H, |, J and K but chooses B, D and E. The matrix is usually 


| q | imple 
ini Irdinari sre are three kinds of choices—simple, 
“Xamining who chooses whom. Ordinarily, there are 








> 





302 Tests, Measurements and Research Methods in Behavioural Sciences 
mutual and no choice. A simple choice is one in which one person chooses the other pe 
he (the other person) does not choose him (the first person). Thus, the choice js die 
example, in Table 12.6, C chooses F but F does not choose C; G choose C but C does not P , 
G. Mutual choices are the two-way choices, that is, both persons choose each as 
example, in the above matrix B chooses D and D also chooses B; B chooses H andi F 
chooses B. The sum of the column of matrix indicates to what extent a particular meng 
chosen by other members of the group. We can say that the greater the sum, the higher te 
popularity. The matrix indicates that B receives 4 choices which means it is chosen by a 
members of the group. Thus, B is popular. A and G are not chosen by any one. D, F, H, |] he 
each receives two choices, C and F receive one choice each. Obviously, C and F are not ~~ * 
in the group. The technique of sociomatrix has the following advantages: | 
1. The relationship between each single pair is well recorded. 

2. Joint relationships between all pairs are also recorded. 

3. Two or more than two matrices can be easily combined and compared. 

4. Sociomatrix is valuable in mathematical analysis and synthesis. 

The sociometric matrix is, however, considered as an inferior graphic device to the 
sociogram (to be discussed later) in studying group structure. Sociogram, or directed graph, i 
another method for analyzing the data obtained from sociometric tests. Sociogram may be 
defined as a pictorial technique to produce a set of nominations or choices. Here, a simple or 
one-way choice is represented by a one-arrowed line like and the mutual or two-way choiceis 
represented by a two-arrowed line like <>». A sociogram of the data presented in Table 12.5 is 
given in Figure 12.2. 

An analysis of Figure 12.1 reveals several interesting points about the structure of the group. 
B is surrounded by arrows, which means he is chosen by several members of the group. Aisa 
rejectee as he chooses B, D and E but neither of them choose him. A has no arrowheads pointing 
at him. |, J and K form a clique, which refers to that structure of the group in which three or more 
persons choose each other. The other common patterns are mutual pairs and isolates. Mutual pat 
refers to a pair in which the two members like each other but no other person in the group likes 
them. C and F are mutual pairs. Isolates are those who do not choose any one and are not chosen 


by anyone in the group. G Is an isolate. 

A comparative study of the sociometri 
same story. The form of the sociomatrix is ta 
The obtained sum of choices in each column 


Q—@ 


es of eleven persons 


c¢ matrix and the sociogram reveals that both tell the 
bular whereas the form of the sociogram is pictorial. 
in the sociometric matrix is equal to the sum ol 





Fig. 12.2 A sociogram of the choic 


2a 


lechniques of Observation and Data Collection 304 


arf awheads POINT aoa particular Person inn the «me iOogram. B 
jable 12.6 and he has tour arrowhead pointing at him. This ic similar 
in the sociogram and in the sociomatrix, As a matter of fact. 
gupplement 6a hy other and ina research work. the 
-omplementary fashion. 


receives tour choices in 
in the case of other persons 
sociogram and sociomatrix 
two should be synthesized and used in a 


sociograms have certain imitations. Sociograms tell the investigator only what persons are 
-hosen OF rejected and not the reason for reyection or ac ceptance. Sociograms revealing coven 
croup structure are only a tentative picture of the group. A sociogram prepared at the beginning 
a certain class will be different from the sociogram prepared at the end of the session of the 
came class. Moreover, Choices and rejections in the group are determined by several factors like 
ws. caste, prestige of the persons being chosen, rivalries, etc. Sociograms provide no way to 
check to what extent these factors are influencing the rejections or acceptances in the group. A 
person may not choose anybody for making friends nor be chosen by any other persons in the 
group, but this does not mean that he has no feeling of friendship at all in the group. Therefore, 
Thorndike & Hagen (1977, 484) have rightly concluded, “a sociogram is at best a rough and 
tentative picture of social currents and climate of the group.” 

Sociometric indices are the next important methods tor analyzing sociometric data. One 
common index is the choice status of a person, which is given by Equation 12.1 as 
indicated below. 

_=c 
n= 


C. ray 
where C, =the choice status of the persons, £C =sum of choices a person receives and n is the 
number of persons in the group. Illustrating the calculation of C, from the data of Table 12.6, 
B can be said to have C, = 411-1) =0,40 and the C, of E=2/(11-1)=2/ 10 =0.20. The value 
of C, becomes the direct index for the individual's popularity in the group. 

Besides the above sociometric analyses, some sociometric instruments have been 
developed for evaluating the interpersonal relationship among pupils in the class. One such 
instrument is the Ohio Social Acceptance Scale in which each pupil rates every other pupil in 
one of six categories: (i) My very, very best friends; (ii) My other friends; (111) Not friends, but okay; 
‘v) Don't knew them: (v) Don’t care for them; (vi) Dislike them. 

On the basis of the ratings by all pupils, a sociometric score Is computed tor each pupil. The 
‘Core indicates the extent of his acceptance in the group. The Syracuse Scales tor Social Relations 
is another important sociometric instrument to assess the social relations is a class. The scale has 
three separate forms of which the first form is meant for the pupils of elementary classes (Grades 5 
and 6), the second form is for pupils of junior high school (Grades 7 to 9) and the third torm ts for 
Pupils of senior high school (Grades 10 to 12). Each pupil is asked to rate every other pupil on two 
a portant psychological needs in each of the three torms. In other words, each pupil rates every 
“ther classmate as a source of aid when he is in trouble with some personal problem 
Deeds Corance), and each pupil also rates these classmates as a source of support in his effort 
" dilain some personal poal, which may bring social commendation and approval 
i hievement). Results are usually summarized in the form ob med ralings ren ceived by a 
Ha PUP. The mean ratings indicate the extent lo which each pupil feels tavourable tow ards 

855M ates. 


Aspyr | vc 
~SCOMELNC Matrix usually pives the following information about the group 
liked by many. SUCH pemors tre ven popular 


Ip 
''d Broup, there are a few per holate 
/ Were are a lew persons who dle 
1 bottom of the column of the 


i) thy os i ‘ee ,| 
I the Broup. Such Persons have the highest scare al tt Ht ; 
| a ne um. thal is. 4. 


Pi 





Foi ry +e TORCOS 
304 Tests, Measurements and Research Methods in Behavioural Science 
fil, Ay i” A Tene 


(ii) Ina group, there are persons who like others but are themselves not liked by othe 
persons are called rejected persons. In Table 12.5, A is a rejectee, | | 

(iii) In a group, there are persons who form a clique of three persons in which all the thre 
like one another but no other person of the group likes them. In Table 12.5, |, J and | 
form a clique. In a clique, the number of persons may be more than three. 

(iv) In a group, sometimes a mutual pair (of two) is formed in which both the Persons [ik 
each other. No third person likes them nor does any of the two like the other Members oj 
the group. In Table 12.5, C and F constitute an example of a mutual pair. 

(v) In a group, there are persons who don’t like any member of the group nor are themselya: 
liked by any member of the group. Such persons are called isolates. In Table 12,5, ( 
constitutes the example of an isolate. 

Group sociometric measures are perhaps still more common. A measure Of cohesiveness in 

a group through sociometric index can be completed by Equation 12.2. 


FS. Sue 


Lie j) | 
5 nn—1) (12.2 
2 
where, C, = group cohesiveness 
2 <> j/)=sum of mutual choices (or mutual pairs) 
n=number of persons in the group 
In Table 12.5, 5(j © J)=6, that is, there are six mutual pairs: C& FI & J, J&K,1&K BAD 
and B & H. Thus, the gfOup cohesiveness by Equation 12.2 will be 


_6 6 
‘a = 1-1) _ 716 g =().11 
2 2 56 


Obviously, the level of group cohesiveness in the group is low, 


There is still another sociometric index which is used when the number of choices which 
each individual is permitted, is limited (such as, two choices or three choices or any fixed number 
of choices), 

In such a situation, Equation 12.3 is applied: 


€ 2 Li © f) (12.3) 
dr/2 
where, d = number of choices each j 
in Equation 12.2. 


Suppose each student of a Broup (n = 5) is asked to name two such students with whom he 


wants to play a game. Further, suppose that 3(j <5 J) in that group = 4. Therefore, Co bY 
Equation 12.3 will be: | 


ndividual is permitted: other symbols are defined like those 


4 | 
aa = 2x5 = 4/5 = ().80 


2 
the degree of cohesiveness in the group is high, 
hésco advantage of sociometric test is that it js an easy and economical means of having 
Mormation about the affected structure of the group. Another advantage is that such tes! 


Possesses the trait of flexibility. As such, this can be used easily in different situations 10! 
different Purposes, 


Obviou sly, 


Despite — | 7 . 
8 thot og these advantages, sociometric tes| has some disadvantages. One great disadvantage 
UCN a test can be advantageously used only when the number of persons in the group !> 








Techniques of Observation and Data Collection 305 


When the number exceeds 20 or increases substantially, a sociometric test proves to 


jess than 20. aioe! | | 
me means for having information about the affected structure of the group. 


he a cumberso 


MS IN OBTAINING EFFECTIVE RATINGS 


5 which limit or affect sound and valid ratings may be conveniently divided into two 
which limit the raters’ willingness to rate the object or persons honestly in 
given instruction, and factors which limit the raters’ ability to rate accurately 
hen the raters intend to do so. A detailed discussion of these two sets of 


pROBLE 
The factor 

arts; factors 
accordance with the 
and consistently, even W 
factors |S presented below. 


Factors Affecting Raters’ Willingness 

in general, rating is an arduous and boring task. In order to obtain sound ratings, the investigator 
refines the instrument carefully by introducing elaborate procedures and precautions and gives 
tructions to be followed by the raters in order to get rid of subjective impressions 
Theoretically, this is good and desirable but practice has 
ns in rating the persons or objects in 
As a consequence, they do the ratings 
finishing the task than upon making the 
the effectiveness of ratings by 


very planned ins 
and superficial reactions of the raters. 
revealed that some raters are not willing to take pai 
accordance with the set procedures and instructions. 
perfunctorily and hurriedly, keeping an eye more on 
accurate judgement. This has the natural consequence of lowering 
bringing down their reliability and validity. 
Raters’ identification with the ratees may be another factor that limits the effectiveness of 
ratings. Often the rater is close to the persons being rated. He may identify with some persons 
positively and with some persons negatively. Persons identified positively are likely to be rated 
high despite some unfavourable characteristics, whereas persons identified negatively are likely 
to be rated low despite some favourable and sound characteristics. In both the situations, the 
rating would be a misleading index and would considerably lower the effectiveness of the ratings. 
| Thus the raters’ willingness to rate honestly and conscientiously is limited by two factors: 
lirst, they may not be ready to take pains to do the rating according to set procedures and 
Instructions; and second, they may identify with persons being rated either positively or 
Negatively, which will colour their judgement. 


Factors Affecting Raters’ Ability 
— raters are willing to make their ratings objective and accurate but some hindrances 
sit — their best intention to do effective ratings. Such hindrances directly affect the raters’ 
Whi © rate accurately. Thorndike & Hagen (1977) have mentioned the following factors, 
are likely to affect the raters’ ability to rate effectively. 
inhens ase unity to observe the individuals being rated: For effective rating, close contact 
With the at rater and the persons being rated is essential. If the persons are not In close contact 
e'intent et, he may not be able to rate them with respect to the trait under study, even if he has 
Ons to do so. For sound ratings, it !s also essential that the rater be given a specific 


®Pportun; 
Pupils fn th to make observations of the persons’ behaviour. When a teacher is asked to rate his 
he basis of his classroom observation, his 


ating the trait of initiative and if he is rating only ont 
| may take initiative in his classroom 


_ o May 
8Ctiviti ay Not represent correct judgement. A pup! 
: home and other social gatherings. 


Qs b ae wie he 
ul the same pupil may not take initiative in a picnic, 
the classroom may be a misleading index. 


MNS ating b 
he individuals being rated are the 


ased simply upon general observation In 
lors Which are and specific opportunity to observe t 

a n uence the effectiveness of ratings. 
_ IMetaction wane ty in the traits being rated: Some pers 
“See Or cn other persons or events whereas som 
sence of such traits is largely dependent upon t 


Pam 


onality traits may be revealed directly in 
e traits are not directly revealed, and the 
he wise inferences and intentions of 





Od ret. Moasuncments and Research Methods in Beharioural Sciences 
400 fe PS, eS 


the investigator. The former type of traits is the objective or overt traits of personality and 
latter type of traits ts subjective or covert traits of personality. Talkativness, domina 
outgoingness, punctuality, etc., are some examples of overt traits whereas feelings 

anxiety, self-satisiaction, ego-strength, emotional control 
personality. In general, the covert traits are rated with lowe 
traits. The reason is that the rating of covert traits is | 


the 
) Ne, 
of INSECurity 
are some of the covert traits e 

d 


r reliability and validity than the 


OVer 
argely dependent upon the inferences Made 
by the raters and they are least involved in outward interaction with other Persons. Since ye 
little regarding such traits comes to the surlace, they are rated with greate 


inaccuracies, Thus, the coveriness of traits is a factor which lowers the re 
the ratings. 


3. Vagueness in the meaning of the 
vague and abstract ones. As : 
naturally affects the consisten 
defined, broad 
from rater to 


ales r difficulty and 
liability and Validity of 
trait rated: Some traits or dimensions to be 
a consequence, their meaning varies from rater 
cy in ratings. On the other hand, when traits or di 
and have a pin-pointed reference, there occur less vari 
rater. As such, their ratings are 

well-defined trait and it is very likely that all the raters will mean the same thing when they Use 
this term. When a group of teachers is asked to rate the pupils in a class on the trait of being 
talkative, variations are less likely to occur in ratings by the teachers because all of them would 
mean the same thing. On the other hand, when the same teachers are asked to rate the same 
pupils on the trait of self-sufficiency (an example of a vague dimension), the ratings are likely to 
lose Consistency (or reliability) because teacher A may use this term to mean that a student has all 
things necessary for the development of his self; teacher B may mean that a student has 
seli-respect, self-control and self-satisfaction: and teacher C May mean simply that a student has 
sufficient physical growth to ensure proper development of his self or personality. Thus 
vagueness or ambiguity in meaning of the trait or dimension being rated is likely to influence the 
reliability of the ratings. 


4. Nonunitorm standard of reference: Most of the rating scales require the raters to rate the 
ratees in any one of different categories: 


rated are 
to rater and this 
mensions are Well 
ations in their 


| Meaning 
more consistent. Talkativeness is 


One such 


Superior, excellent, very good, good 
Satisfactory, unsatisfactory 

Best, good, average, fair, poor 

Good, sufficient, average, below average, less 


When a rater rates a ratee as superior, the question arises as to what the standard is against 
which a ratee is being classified as superior. ls he being compared with the top 5% or top 10%! 
Or, is he being compared with the middle cases? The raters have no uniform standard betore 
them so that the interpretation of category ‘superior’ may not have an identical meaning to all. In 
the absence of a uniform standard, the interpretation of a category varies from rater to rater. One 


rater’s ‘superior’ may be another rater’s simply ‘very good’. This naturally lowers the consistency 
of ratings and thereby, their reliability. 


5. Raters’ personal characteristics: Raters’ personal characteristics also tend to spoon 
the ratings. Some raters are Conservative and, therefore, they tend to rate persons — _ 
middle. They rarely rate a person very high or very low. Some raters are tough and, t bel 
rarely rate anybody high. These are popularly known as errors in ratings which OF ans 
special importance, have been separately explained later. Besides these general —_ a 
some personal experiences also tend to affect the ratings. A teacher who is —— ee 
behaviour of a student in a classroom is likely to rate him low; a supervisor in an aah hell! 
pleased with the way a worker behaves with him and therefore, is likely to dora an feanil 
his overall performance is satisfactory or not. An officer may be annoyed wit im ih likely to be 
stenographer and typist simply because she ignores him. In such 2 ss sale ee ais 
rated low, no matter how good her performance may be. Thus personal experie 


/ 











Techniques of Observation and Data Collection 307 


eir judgement and also their ratings. In such situations, the ratings done by one 


. colour th 
likely to : eee? eee | 
ared with the ratings done by another rater. Personal experiences, therefore. 


rater cannot be comp ie aia: 
-e the potent factors, which influence the ratings. 
ar : eco iy 
he above limitations, it is suggested that rating scales can be an effective means of 


In view oft : 
rs only when the points given below are given due weightage. 


appraising othe 
METHODS OF IMPROVING EFFECTIVENESS OF RATING SCALES 

psychologists have devised some means to improve the effectiveness of such rating scales. Given 
below are the important means. 
Refinement in Stimulus Variables of the Rating Scales 

to improve the effectiveness of rating scales, it is essential to make refinements in 
variables of the rating scales. By stimulus variable (S) is meant the traits, dimensions or 
o be rated. As stated earlier, some traits are objective and easily observable and, 
can be rated with greater accuracy and consistency than traits which are subjective 


hes on the different types of traits appropriate for higher accuracy, 
intelligence, 


in order 
stimulus 
qualities t 
therefore, 
and ill-defined. Researc 
consistency and objectivity in ratings have revealed that traits like punctuality, 


iendliness, clearness, leadership, physical health, scholarship and independence are rated with 
raits like courage, co-operativeness, depression, 


h wider disagreements among raters. From these 
if followed, 


higher consistency and accuracy whereas t 
popularity, sympathy, kindliness are rated wit 
various researches some guidelines regarding the traits have been prepared which, 
are likely to produce higher agreement and consistency In ratings. These guidelines are given as 
follows. 

1. Each trait should clearly refer to only one type of activity. 

2. Atrait should not represent a combination of a number of traits that vary independently. 

3. Traits should be such as can be defined in objective, unequivocal and specific words. 

4 


. Traits should be judged on the basis of past and present accomplishment rather than on 
the basis of future accomplishment. 


. In defining a trait words like ‘often’, ‘always’, ‘very’, 
because these are general words and give no hints regarding the appropriateness. 


‘extremely’ should be avoided 


a 


Refinement in Response Variables of the Rating Scales 
The response variable of the rating scales refers to the response option, which consists of 
er, these categories in terms of which the 


line a or adjectival categories. AS mentioned earl i u cect 
an i | one, are very sub jective. One person's ‘superior’ Is another person’s outstan ing’. Not 
'y this, these categories also allow the tree play of a raters bias. It is, therefore, argued that 
inements in response categories are essential so that the objective interpretation of categories 


ref 
ee the objectivity in ratings can be achieved. One of the most fruitful attempts in this 
fibre the as been to develop a rating form that forces the raters to discriminate between two or 
evelon " two equally desirable or undesirable response options. The second attempt is to 
rst ie torm which controls raters’ differences In judging standards. An example of the 
Percentas P # the forced-choice rating pattern and examples of the second attempt are 
5€ scale, graphic scale, behaviourally anchored scale, man-to-man scales, etc. All these 


lech; 
. ITV u . : 
ques have already been discussed in detail. 


if 


IMpro 
a: ee in Rating Procedures . . ; 
the ee the stimulus variable and the response option of the rating scale have been refined, 
Several enn be objective and consistent unless their procedures are improved. There are 
Ors that go together in improving the rating procedures. These factors are given below. 
fun dame ction of raters: The selection of proper and appropriate raters is one of the 
nta| “spects of rating procedures. A proper and ideal rater is a person who has been in 














308 Tests, Measu roments and Research Methods in Behavioural Sciences 


the ratees and has several opportunities to observe them in situations relevan, 
f the rater does not know the ratees intimately, 
h he is motivated to do so. Sometimes, it may bi 
ount of training in the type of judgement they are 


close contact with 
to the trait on which they are to be rated. | 
cannot rate them well, no matter how muc 


necessary for selected raters to have a certain am 
required to make and in the use of rating instruments. Researches conducted in industrial as wel 


as military settings have revealed that training which defines aims and purposes of the ratings and 
‘Illustrates the meaning of traits to be rated, tends to increase the validity of ratings by decreasing 
constant errors like the halo effect, error of leniency and error of central tendencies. Besides these 
considerations, some other factors related to raters should also be given proper attention for 
ensuring effective ratings. For example, raters should be allowed sufficient time In their rating and 
the raters’ educational and professional backgrounds should be similar to the ratees’ educational 
and professional background. 
2. Improving the reliability of ratings: One general limitation of the rating scales is that its 
between-raters reliability is very low. That is, if two or more than two independent raters have 
rated the same persons, correlations (a measure of consistency) between the raters are generally 
very poor indicating a low reliability. To improve the reliability of the ratings, such ratings done 
by independent raters may be pooled together provided all the independent raters are equally 
qualified and/or know the ratees equally well. Pooling of the ratings functions tn the same way as 
increasing the length of the test and as reliability of a test is increased after its length is increased, 
so also reliability of ratings is increased after independent ratings are pooled. Like a lengthened 
test, the Spearman-Brown formula (Equation 5.17) may be used to estimate the reliability of 
pooled independent ratings. To illustrate, suppose the reliability of one rater is 0.45. Then the 
reliability of two raters and three raters by substituting in Equation 5.17, would be: 


= __2(0.45) _ = 0.62 (Fortwo pooled ratings) 
1+(2-—1)0.45 
3 (0.45) 

the pooling of independent ratings is not feasible because 
the rater may not be equally qualified and/or equally know all the ratees. For example, in a 
primary school each class has only one class teacher who knows all the pupils of his or her class 
more or less equally well. Other teachers may not know the pupils of the class to that extent. In 
such a situation, if the independent ratings of a particular class done by a group of several 
teachers (including the class teacher) are pooled, the reliability of ratings Is likely to decrease 
rather than increase. A similar situation may occur in the case of rating workers in an industry of 
ce. Only the supervisor who is the officer-in-charge of the workers will know them 
ho is in charge of the group of clerks will know them well. Other 
owing them well and hence, the independent ratings by a 
be pooled to estimate the reliability of the ratings. If it 


=071 (For three pooled ratings) 


But in many practical situations, 


clerks in an offi 
well and the immediate officer w 
supervisors OF officers may not be kn 
group of supervisors or officers should not 
‘s done, chances are that the reliability will go down. 


ERRORS IN RATINGS 


Ratings are subject to varieties of errors, some of which have been referred to earlier. One general 
class of errors called constant errors (because they tend to throw the ratings constantly in one 
direction) consists of halo effect, error of severity, error of leniency and error of central tendency. 
Besides these constant errors, there are contrast errors, logical errors and proximity errors in 
ratings. A detailed discussion of each of these errors is presented below. 


lech Nigues of Observation and Data Collection 309 


Halo Effect . 

Halo effect is one of the very common Constant errors in rating. The error was first mentioned b 
Wells in 1907 and was named so by Thorndike (1920), In the words of Anastasi (1 968 420), halo 
effect refers to “a tendency on the part of the raters to be un luly influenced bya single favourable 
or unfavourable trait, which colours their judgement of th rai 


. individual's oth r ‘oh | 
effect is produced when the rater rates the ratee er traits”. Thus halo 


Sin a constant direction of general im 
5 pre - vlan | | | . pression, 
which he has formed earlier. The common example of the halo effect ic the tendency on the part 


of a teacher to rate the quality of answerbooks of a pupil whom he likes as higher than the 
answerbooks of a pupil whom he does not like. Likewise, a teacher who is very much influenced 
by the outstanding performance of a student in 3 Bame, may rate his answerbook higher, though 
the quality of answer may not be up to the standard. Symonds (1925) has reviewed a series of 
experiments conducted to locate situations in which the halo effect is most obvious and has 
concluded that it occurs mostly in a trait which is not Clearly defined, not easily observable 
difficult to single out frequently, and involves moral importance or belongs to the traits 
of character. 

Halo effect has two marked effects upon ratings. One obvious effect is to throw the ratings in 
either a positive or negative direction depending upon the general impression, and to that extent, 
the rating loses reliability and validity. The second effect of halo is to make the positive 
correlations between traits being rated spuriously high. Halo effect, therefore, must be 
controlled, though it is very difficult to avoid. One general way to reduce the halo effect is to 
define the trait in observable units of behaviour Halo effect may also be counteracted when the 
rater rates all ratees on one trait before going on to the next trait. Thus when several ratees are to 
be rated by the same rater on several traits, the practice of rating all the ratees on one trait at a time 
forces the raters to pay attention to each trait separately rather than to the general impression of 
each ratee. Halo effect may be reduced by arranging the scale points in such a way that the 
desirable end of some traits fall on the right-hand side and the desirable end of some other traits 
fall on the left-hand side. This arrangement automatically prevents the rater to rate a ratee on the 
right-hand side for his general desirable impression and to rate him on the left-hand side for his 


overall undesirable impression. Forced-choice formulas may also be adopted to reduce the halo 
effect in ratings. 


Error of Severity 


The error of severity refers to a constant tendency (irrespective of the trait bei ng rated) on the part 
of the rater to rate the ratees too low. Such persons are ‘hard raters’ and are usually reluctant to 
fale any person or object high. Kerlinger (1973, 549) defines the error of severity as ‘a general 
tendency to rate all individuals too low on all characteristics.’ Forced-choice type of rating, 
where all options appear equally attractive and where the rater is unable to throw the ratings low 
athis will, is one convenient technique for reducing the error of severity. 


Error of Leniency 


AS its name implies, the error of leniency refers to a constant tendency, irrespective of the traits 
“Ig tated, to rate the ratees too high. Such persons are ‘easy raters’ and try to concentrate on the 
‘Mer end of the scale. Such raters are reluctant to rate anybody unfavourably. Like the error of 

*eVerity, this error can be controlled by the forced-choice technique. 


Error of Central Tendency 


"Or of central tendency refers to a constant tendency on the part of the rater to avoid extreme 
‘atings and place his ratings in the middle or average category. Kerlinger (1973, 549) defines the 
“Or of central tendency as ‘the general tendency to avoid all extreme judgements and rate right 
Wn the middle of a rating scale.’ When the rater is asked to rate the individuals who are not 


well known to him he hesitates to rate them on the extremes and, therefore, prefers to rate them in 








310 = Tests, Measurements and Research Methods in Behavioural Sciences 


the middle, In order to counteract this error, Guilford (1954) suggests that greater variation " 
meaning between the steps of the scale at the extreme than between the steps at the middi 
should be introduced. 


Besides these constant errors, other types of errors common to rating scales are given beloy 


Contrast Error 

Contrast error, first pointed out by Murray (1938) 
rate persons or objects in a direction which is in 
through this error the rater projects his own attitu 
who is very punctual in coming to the class, may fa 
Similarly, a supervisor who possesses the trait of orderliness 
opposite to this trait, that is, he may rate others to be less orderly. 


refers to a tendency on the part of the raters to 
contrast to the trait he himself possesses. Thus, 
de or bias in the ratees. For example, a teacher 
te other students to be less punctual than hes, 
may rate others in a direction 


Proximity Error 


Proximity error, first pointed out by Stockford & Bissell in 1943, Is one which occurs due to 


proximity or nearness of two traits being rated. The nearness between two or more traits may be 
‘n terms of meaning or in terms of time. Co-operativeness and friendliness are more or less two 
similar traits in meaning. When the rater Is asked to rate such traits without intervention by some 
dissimilar traits, the correlation between the ratings tends to be spuriously high. However, when 
these traits are separated by dissimilar traits, the correlations between adjacent traits and remote 
traits are nearly equal indicating the cancellation of proximity error. Thus proximity error tends to 
make the intercorrelations between adjacent traits (which are similar) higher than those of remote 
traits, though their degree of actual association may be equal. Proximity error may also be 
counteracted by allowing a sufficient time interval between the rating of one trait and another. 


Logical Error 


Logical error Is simi 
the intercorrelation of the ratings of 
Newcomb (1931), is one where the rater gives more or less a simil 
him, somehow logically related with each other. Thus the logical presupposition or similarity 
(about the traits to be -ated) in the mind of raters is the basis of the error. The logical error is most 
obvious where the traits to be rated are abstract and semantically overlapping. This-error can be 


reduced, at least partially, when the raters are given the most objective and clearly defined traits 


for ratings: 
EVALUATION OF RATING SCALES 


Despite their limitations, rating scales undoubtedly continue to be one of the most promising 
appraisal techniques. Rating scales have a wider range of application. They can be used even by 
those persons who are not trained. They can be employed when a larger number of stimuli are 0 
be rated and in this task, the rating scales yield far more consistent results than their rival, the 
ranking method or the method of rank order. The common observation is that when the number 
of stimuli to be ranked is large, say for example 80 or 90, the ranking becomes a difficult task and 
not only that, even the person who is doing the ranking, gets bored. Rating scales, and 
particularly the graphic scale, present a very interesting task for the rater and he seldom ges 
bored an with a long list of traits to be rated. Some comparative studies of ranking method and 
oe <r is ig pi facts. Conklin & Sutherland (1923) studied different 
yielded less variance a Faaad hi the rating scales and found that the rating methoe 
eported that: ondinarily-the mean ef . — than the ranking method. Symonds (1925) has 
results wielded By the enking ylelded by rating scales are in no way less reliable than the 

y ranking method. So far as the various errors involved in ratings até 


lar to proximity error and halo effect in the sense that it also tends to increase 
the two or more traits. Logical error, pointed out by 
ar rating to traits which seem to 


_— 


Techniques of Observation and Data Collection 311 


cerned they are such that they can be controlled and corrected, if proper care is taken 
the rating scales, on the whole, have a bright future as an appraisal technique of traits, 


ents, etc. 


concel 
Therelore: 
objects, 
NING AND FEATURES OF SECONDARY DATA 

condary data we mean those data that are in actual existence in accessible records, having 
ected and treated statistically by the persons maintaining the records. In simple 


dy coll 
ry data are data that have already been collected, presented, tabulated, treated 
tistical techniques and conclusions have been drawn. Therefore, collecting 


eans doing some original enumeration, rather it simply means obtaining 
including government departments. When once 
Ise, it becomes secondary data in the 


MEA 
By Se 
heen 4 rea 
words, seconda 
with necessaly st 


dary data never M 
lready been collected by others 


been collected originally by someone e 
are desirous of handling it for their own purposes. If Mr A collects 


primary data to Mr A but the same data when used by Mr B, 


secon 
data that have a 
rimary data have 
hands of all other persons who 
some data originally then data are 
becomes secondary data to Mr B. 
The major features of secondary data are as under: 
(i) Secondary data:are:y eady-made and in this sense, they are time saving. The researcher 
easily saves the time and labour involved in conducting interview or administering 


questionnaire. 
(ii) The form and content of secondary data are shaped by others. Clearly, this feature of the 


secondary data can limit the overall scientific value. 
Secondary data are not limited in time and space (Festinger & Katz 1953). It means that 
the researcher using secondary data need not have been present either when or where 


they were collected. 
n broadly be said that the secondary data are items of 


Keeping these features in view, it ca 
information originally collected for a purpose other than its present scientific use. 


(iii) 


Sources of Secondary Data 
ones are as under. 


There are many sources of secondary data. Some of the important 
1. Private sources of secondary data: Private sources of secondary data are strictly 


personal documents like letters, diaries, and other bibliographical materials such as individual 
life histories. These are considered very useful in examining social conduct. Another private 
source of secondary data exists in the tiles and published materials of private organizations that 


characterize most of the societies in the world. 

q 2. Public sources of secondary data: Public sources of secondary data are those sources 

big. on the information available in various data archives and other related sources. The 
§ are some such important sources. 


Central and state government publications 
Mu Publications done by international! organization like U NO, UNESCO, WHO, etc. 
i Official publications as well as reports of municipalities, zila parishads, etc. 
: ue government publications | 
ports and publications of various commissions like the UGC, !ICSSR, trade 
(vi rate banks, stock exchanges, etc. 
ell-known newspapers and journals 





(Vii) Iritern 
Vij) Nternet and various website sources 
Publications brought out by Supreme Courts, High Courts and Lower Courts 





> 








esearch Methods in Behavioural Sclences 
Tests, Measurements and Researe 
312) Vests, Measiun 
he above list of secondary data, however, can’t be said to be thorough Or com 
a | Fa | iM ii ‘ , : wi 3 
The a number of other sources also such as MaNUSCripts of eminent scholars 
alana cords of business firms, etc. 
labour bureaus, records of bu« 


Plete be, : 


Sf 


Verification of Secondary Data 
It is suggested that before accepting secondary data, it IS always essential to s¢- 
properly in regard to its accuracy and reliability, Therefore, verification of Secon 
essential and it should be carried out in the following manner: 
(a) Whether the data was collected at the Proper time 
(b) Whether the organization that has collected the data is reliable 
(c) Whether the data has timely relevance and js not outdated for the Present research 
(d) Whether the appropriate statistical tools, if any, were used by the Primary dap, 
investigators 
(e) Whether “Aumerators of facts and figures had any prejudice, bias, o, showed 
negligence, hurry or carelessness in their compilation work 
(f) Whether the data enumerators had ensured 


Proper standards of accuracy. If so, UD to 
what degree such accuracy has been shown, 


Major Uses of Secondary Data 
There are some Major uses 0 
angles. 


(i) Secondary data can be viewed 
§enerate scientific information, 

(ii) Secondary data can also 
of the scientific enterpris 


For those social scientists who Support the viewpoint of the lirst use, there are th ree ways to 
Proceed ahead. 


from the standpoint of how they can really be used to 


be viewed in terms of ways they promote the Overall objectives 
e. 


(b) Such secondary data May serve as auxiliary documentation, that is, they can 


words, when secondary data are appropriately used, they can provide 4 relatively crude 
form of replication, 


50 far as the uses of secondary data in terms of overall objectives of the scientific enterprise is 
Concerned, all the above-mentioned applications are Considered important. 


In a nutshell, it can be said that whatever uses secondary data sources have for scientific 
activity, it depends upon the subjective Ingenuity of the investigator in making use of them. 


Advantages and Disadvantages of Secondary Data Analysis 
The major advantages of secondary data analysis are as under: 
(i) Such analysis saves time and money (Useem 1973), 


(ii) Such analysis provides a 800d and real Possibility of using the work of others to broaden 
the base from which scientific generalizations can be easily made. This is especially so 
When information from several cultural settings is being studied. 


————_ 





Techniques of Observation and Data Collection 313 


... secondary data analysis provides a ground for using such data in verifying the findings 
oy already obtained in primary data by an investigator. In fact, by turning to the sources 
already containing relevant information, the verification process is rapidly enhanced. 


The major disadvantages of secondary data analysis are as under: 


) Even under most optimal conditions, secondary data analysis poses difficult problems 

for the researchers. Those who hope to save time and reduce the amount of tedium 
connected with procedural details such as coding and classifying may become 
disappointed due to the lack of accuracy and reliability. 


(ji) Another problem with secondary data analysis is that it is sometimes not possible to 


comprehend the process of original data getting. Simply because data are available does 
not mean that reasons are given for why they were gathered. 


(iii) In secondary data analysis, the knowledge of the whereabouts of the sources is not 
necessarily available to all social scientists on an equal basis. Any one who is familiar 
with social organization of scientific research knows that accessibility depends to a 
degree on proximity. Being at distances from data archives can do much to hinder the 
potential accessibility of secondary information. This, in turn, can affect the knowledge 
an investigator might have of the types of sources. 


(iv) In secondary data analysis, the investigator has to depend upon data already collected 
for other purposes. Such data must be classified in Ways consistent with the objectives of 
the present research. At this stage, several technical problems do arise. The researcher 
may find the relevant variables are missing or not measured in ways that render them 
useful as data. He may also find that the data as they exist, don’t conform to the 
categories and classifications envisioned by him. Due to these reasons, serious doubts 
about the overall worth of the secondary data source arise 


and many researchers may 
find it pleasant to abandon their projects at this point. 


Thus, we see that although secondary data analysis proves y 
without its cost. 


_ | 


¥] Review Questions 


4 questionnaire? Discuss the characteristics of a good 


ery easy and time-saving, iS not 


1. What do you mean by 
questionnaire, 


po 


Discuss the relative 


3. Make a distinction between questionnaire, Opinionnaire 
major functions of a questionnaire. 


Advantages and disadvantages of different types of questionnaires. 


and schedule. Discuss the 


Discuss the Important types of observations. Make 


distinction between participant 
observation ana rm NpParuicipant Observation. 


Discuss the methods of content analysis. |} 


‘ont Out the important limitations of 
content analysis 


Describe briefly the two main Approaches to objective observation and elucidate the 


factors affecting the reliability of observational data. 
Discuss the usefulness and limitations of soclometry as 4 tool of psychological research. 
Discuss the major problems in obtaining effective ratings. How can they be resolved? 
Discuss the important sources of errors in ratings. 

Discuss the 





major functions of interview. Also, point out the major sources of error 


in Interview. 








414 


I]. 


12. 


14. 


-_ 


lb. 


Discuss the relative advantages and limitations of different types of interviews 
my dS | 


behavioural sciences, | 
Bring out the utility of semantic differential as a tool of psychological resear, h 
Whatis meant by rating scale? Discuss the importance of different types of rating 7 

Discuss the various methods of improving the effectiveness of rating scale. —_ 


Discuss the important types of errors involved in rating scale. 
Discuss the major sources of secondary data. Also, point out the major advantay: 
a) ay, 


disadvantages of secondary data. 


Yosts. Measnrements and Research Methods in Bebavioural Sciences 


+ 





13 
SCALING TECHNIQUES 


CHAPTER PREVIEW 
Histinction between Psychophysical and Psychological Scaling Methods 
psychophysical Scaling Methods 
Method of Limits 
Method of Constant Stimuli 
Method of Average Error 
e Weber's Law and Fechner's Law 
e Stevens’ Power Law 
« Newer Psychophysical Methods 
Method of Category Scaling 
Method of Magnitude Estimation 
® Psychological Scaling Methods 
Method of Rank Order 
Method of Successive Categories 
Method of Pair Comparisons 
* Methods of Attitude Scales or Opinionnaires 
Method of Equal-Appearing Intervals 
Method of Summated Ratings 
Guttman’s Scale, or Cumulative Scale 


ee EE 


DISTINCTION BETWEEN PSYCHOPHYSICAL AND PSYCHOLOGICAL SCALING 
ETHODS 


*caling methods are methods through which stimuli or individuals are sorted according to some 
Si and specified characteristics or attributes. When these methods are successfully applied 
Desai purposes, their results automatically yield a continuum which is _ a —. 
ide upon the classes of stimuli (to which the scaling methods are PP ie : = ing 
Pavchins can be grouped into two general classes—psychophysical scaling methods and 
Ogical scaling methods. 
” . P Ychophysical scaling methods, the stimuli presented have a physical eae other 
CEntime stimuli presented can be described by a physical scale seen | ecibels, 
Nvesti ie minutes, and so on. The obvious purpose of psychophysica scaling methods is te 
ide pale the quantitative relationship between subjective measurement of these stimuli (as 
PY chon, € subject) and the objective measurement by the physical scales. In other words, the 
then}... Cal scaling methods intend to discover some definite quantitative relation between 
imu stimulus and the resulting sensation (or experience) by manipulations of the physical 
Mensions, 
te st uh Ological scaling methods differ from the psychophysical scaling methods in that 
"Otds ey Measure are such that cannot be described by a known physical scale. In other 
Stimuli: here possess psychological attributes, which obviously lack an appropriate 


315 








316 Tests. Measurements and Research Methods in Behavioural Sciences 


physical referent. In the preface to his famous book, Theory and Methods of Scalin 
(1958) has said, “Psychological scaling methods are procedures for constructing as 
measurement of psychological attributes.” For measu ring psychological attributes like 
humour, quality of beauty, extent of honesty, aggressiveness, pleasantness, and so on 
no physical scales and, therefore, these attributes are best measured by the psychological 
which give accuracy to the measurement of stimuli through the judgemental pr hs 
of individuals. ii 

There are different types of psychophysical scaling methods and psychological 
methods. Each of these types will be examined in separate sections. 


SCaling 


PSYCHOPHYSICAL SCALING METHODS 

Suppose that 20 objects of the same size and shape but of different weights are to be arranged jy 
an order from the heaviest to the lightest. The obvious way to do this is to weigh them, record the 
weight of each object and arrange them in terms of their measured weights. Further, SUPPOSE thy 
no scale is available for weighing the objects and they still have to be arranged in an order frop, 
the heaviest to the lightest. One obvious way to do this is to give all these objects to a group 9 
individuals with the request to arrange them in order of heaviness. Still another way may be jp 
present all the objects in all the possible pairs and ask each individual to judge which one of each 
pair is heavier. Subsequently, on the basis of the average judgements, all the 20 objects can be 
ordered from the heaviest to the lightest. The scale used for weighing the objects is known as the 
physical scale and the order in which the objects are arranged in terms of the measured weights is 
known as the physical continuum. The order in which the objects are arranged on the basis of 
judgements made by the subjects regarding the weight is known as the psychological continuum, 
Several attempts have been made to investigate the relationship between the ordering of objects 
on a known physical continuum and the ordering of the same objects on a psychological 
continuum formed by judgements of individuals. The methods used in studying such a 
relationship are known as psychophysical scaling methods. 

The term ‘psychophysics’ owes its origin and name to G T Fechner (1801-1887) who 
defined it as “an exact science of the functional relations of dependency between body and 
mind.” For the first time, he set out to explore the quantitative relationship between the 
magnitude of sensation occurring in the mind and the magnitude of the physical stimulus thal 
produces the sensation. For investigating the quantitative relationship between the magnitude of 
sensation and the magnitude of physical stimulus, he developed some experimental methods, 
which are still in use today. Before examining these methods, it is essential to explain the 
meaning of some common terms used in psychophysical measurements such as threshold, point 
of subjective equality, point of objective equality, etc. 

The concept of threshold was first introduced by Johann Herbart in 1824 when he defined 
the ‘threshold of consciousness’. The Latin equivalent of the term ‘threshold’ is /imen. Threshold 
refers to that boundary value on a stimulus dimension, which separates the stimulus that 
produces a response from the stimulus that makes no response or a different response. The 
threshold in psychophysical measurement Is ordinarily divided into absolute threshold and 
difference threshold. Absolute threshold, or stimulus threshold (abbreviated to RL from its 
German equivalent Reiz Limen), refers to that minimal stimulus value which produces a response 
50% of the time. A physical stimulus value which is below that minimal value fails to elicit 4 
response. Andreas (1960, 99) has defined absolute threshold as “a boundary point in sensation, 


Sealing Techniques 317 


sarating Sensory experience from no such experience when physical stimulus values reach a 
si cular point.” Thus absolute threshold defines the minimum limit for responding to 
at lation. RLfor a single physical stimulus is not the same for different individuals. It varies from 
” jividual to individual, and sOmeumes from one situation to another for the same individual. 
in ce, some people may perceive a stimulus at a low value while others may perceive the same 
adie at a higher level. That's why RL is Statistically defined as the mean of RL, taken over 
ceveral trials by tne same subject for the same stimulus. The difference threshold, or differential 
th reshold (abbreviated to DL from its German equivalent Differenz Limen), is the difference 
between the two stimuli, which can be perceived 20% of the time. Thus DL defines the 
individual's capacity to respond to difference In sensitivity. The difference threshold is also 
cometimes referred to as a just noticeable difference (abbreviated to JND) which is the smallest 
difference between the two stimuli that can be detected by the subject. A stimulus must be 
increased or decreased by one JND in order that the change be perceived. Usually, for 
calculating DL, two stimuli are presented to the subject. One of them has a constant value 
throughout its presentation and is known as the standard stimulus (S,) and the second stimulus is 
varied throughout its presentation and is known as the comparable stimulus (C,, ). Suppose the 
experimenter takes an S, of 100-g weight and starts presenting C,, at a very low value (say 30 g). 
He may then go on increasing the value of C, by a very small value (say at the rate of 5 g in each 
presentation) until it becomes indistinguishable from the 5, This is called the lower difference 
threshold. Likewise, the experimenter may start presenting the C, at a much higher value than S, 
(say, 140 g) and then go on decreasing the C,, by a very small value (say by 5 g) until it can no 
longer be distinguished from S,. This is known as the upper difference threshold. The upper and 
lower thresholds represent the upper and lower limits of the interval of uncertainty or JU 
respectively. The /U indicates the span or range where the responses of the subjects are uncertain. 
The DL or JND is half the difference between the upper and lower threshold, that is, DL = Upper 
threshold — Lower threshold/2 or /U/2. The DL, like the RL, taken over several trials by the same 
subject for the same physical stimulus is not identical in different situations. 

RL, thus, is a point on a physical stimulus, which is detected 50% of the time whereas D1 is 
Not a point; rather, it is a span or distance or range where the amount of change in magnitude of 
the stimuli can be detected 50% of the time. 


The point of subjective equality (PSE) is another important concept used in psychophysical 


measurement. It is defined as that value of C,, which is, on the average, judged by the subject to 
: = the value of S,. Surprisingly, PSE is rarely equal or identical to S,- It is a very usefu 
rie in the measurement of the extent of illusion. In the Muller—Lyer illusion, the extent or 

*'0n (or the Constant error) is defined as the difference between the PSE and the point of 


Object | 

a “quality (POE). The POE is defined as the exact value of S, . Thus, the constant error 

and = Indicates the extent of the Muller—Lyer illusion is simply the difference between PSE 
~ 9) an 


d may be expressed by the following formula (Woodworth & Scholsberg 1954): 


CE = PSE -S, (13.1) 


Th 
the "S When 


br Judgements or discriminations differ significantly from the standard stimulus, 
Ese 
and if B= 


NCe of the constant error is assumed. If the CE is negative it indicates underjudgement 
ind ‘ — it indicates overjudgement. Thus, if a 50-mm line is used farrow-headed line), 
"lusicn ia of the subject's setting of the C, (the feather-headed line) in the Nauman hyet 

~ "1m, the CE, and therefore, the extent of illusion, is 45.5 mm —50 mm = —4.5 mm. 











a 1 me. ire f I } iy j i fi J i } al a Ya le ] ic é a 


es underjudgement, that is, the feather-headed line % 
real | a) 
in other words, the real judgement of arrow-headeg ‘‘ 
IMy. 


feather-headed line of 45.5 mm appears to be of 5, 
1G Tr, 


Because CE Is negative, I indica 


arrow-headed line. 


longer than the 
5 mm). Thus the 


underestimated (at AS. | , cated! 
2 a vod ae longer than the arrow-neaged tine. It shor 
prov ing eV idence tor being per eve das lor 24 hould he Clea, 


held that the constant error ts not really cons! 
qual in ea h successive trial or that the direction of @ 


ant from trial to trial. The word ‘constant’ does, 


mean that the amount of the error Is ¢ 


gative) is consistent over the successive trials for any given subject. Frequent) 
= L. ota : . 


(positive or ne 
part of the subject to underjudge or overjudge a given compars} 


systematic tendency On the 
stimulus with some standard stimulus ts found. Such a systematic tendency is known a: i 
constant error. Why is then the 


from trial to trial? This is probably b 
attention of the subject or due to the unco-operativeness af , 


amount of error not equal in each trial? What causes fluctuatic, 


ecause the cause is not known with certainty. It may be due 


the variations in motivation or 
subject. Error produced by these factors and the like is known as variable error or chance err; 


Such errors work in both directions and together they tend to cancel each other out. Usually, 
e source of error is not known with certainty and does not throw the response + 
term ‘error’ here does not indicate that one judgement ts right arr 
nt of fluctuation in judgements trom trial to trial. The 


variable error th 
one constant direction. The 
another is wrong. It simply indicates the exte 
variable error is indexed through the standard deviation (SD) of those responses judged to be 
equal to S,. However, three factors, namely, subjects’ sensitivity which may vary from moment 
moment, slight unavoidable changes in the physical characteristics of the stimulus and change 
in interest and attitude seem to be an important source of variable error. 

Basically, there are three classical psychophysical methods which are frequently used. 4 


detailed discussion of each is presented separately. 


Method of Limits (also known as Method of Just Noticeable Difference, Method of 
Minimal Changes or Method of Serial Exploration) 


‘ee 7 


The method of limits is a popular method of determining the threshold. The method was 
named by Kraepelin in 1891 because a series of stimulus ends when the subject has reached that 
limit where he changes his judgement. For computing threshold by this method, two modes 0! 
presenting stimulus are usually adopted—the increasing mode and the decreasing mode. The 
increasing mode is called the ascending series and the decreasing mode is called the descendins 


- * a is F . ‘ 7 Jens 
series. For computing DL, the C, is varied in possible small steps in the ascending and descene"'> 


=}, ul 


series and the subject is required to say in each step whether the C,, is smaller (-), equal (0 | 
larger (+) than the S,. The data (hypothetical) given in the Table 13.2 illustrates the computation © 
DL by the method of limits. For computing RL, no S, is needed and the subject simp!» reports 
whether or not he has detected change in the stimulus presented in the ascendins and 
descending series. The data (hypothetical) presented in Table 13.1 illustrates the computation” of 
RL In computing both the DL and the RL, the stimulus sequence is varied with a minimum change 
in its magnitude in each presentation. Hence, Guilford (1954) prefers to call this method 


method of minimal changes. 





Scaling Techniques 319 


Table 13.1 Determination of the lower threshold of sound intensity by the method of limits 





Sound Intensity A D A DO 8 oD &) oo oe 
(arbitrary Scale Units) 


ID + + + 





2 = = 
1 - 


Individual Thresholds: 65 65 85 85 95 75 85 7.5 85: 7.5 


5.5 +8.549.54+8.54+8.5 _ 
Mean ascending threshold (MAT) = ———— = OS 


MM : F 
ean descending threshold (MDT) = 5 


MAT+MDT _ 8.3479 79 


Mean 
abs — 
Solute threshold 5 ; 


XD 


= 0.9) 65 = 0.92 





s out of which 5 are in the ascending series and 
ascending series and the experimenter starts 
the subject reports ‘No’ (thatis, he does 
the series. Then, the experimenter 


Fables _ | 
> ate ” 3.1 illustrates the data taken in 10 serie 
) e 


ec descending series. The first series is the . 
al Pe : ; - , ens 
Noth... 24nd) at 1 scale unit (which is arbitrary), anc 


ear it) anc ninus sign in 
and accordingly, it is indicated by 4 mInu» sign it 


Pm 


“s 





320 Tests, Measurements and Research Methods in Behavioural Sciences 


goes on increasing the sound by one une per trial and tne ae ples! continues (0 report ‘No! tn 
the experimenter reaches the seventh unit where the subject changes his response from Mh 
‘Yes’ (that is, he hears it) and accordingly, it is indicated by a plus sign in the Series. Thy rt 
threshold for this particular series lies somewhere between 6 and 7 and hence, the Midpoin, : 
becomes the threshold and is written below this column as one estimate of RL. Next Comec i 
descending series where the experimenter starts the C,, stimulus at 15 units, which is much aho 
the threshold found in the previous series. The subject reports ‘Yes’ which is indicated by a " 
sign. Again, here the experimenter goes on decreasing the C, by one unit in each trial unt hy 
subject changes his response. This time he changes his response from ‘Yes’ to ‘No’ at the sig 
unit. The midpoint of 6 and 7 is 6.5 which becomes the threshold for this descending Series and) 
is entered below this column as another estimate of RL. The thresholds thus found in Each seria 
are also called transition points above which the subject changes his response in the ascendin 
series and below which he also changes his response in the descending series. Such transition 
points have been demarcated in Table 13.1. Thus, several alternate ascending and descendin 
series are taken until the experimenter is well satisfied with the relative uniformity of the differen 
individual thresholds. In each series the experimenter deliberately changes the starting point 59 
that the subject does not fall into a habit of routine guessing, which may facilitate his task, For 
computing RL for the data obtained in 10 series, the individual thresholds entered at the bottom of 
each series are averaged; in our case it is 7.9. The standard deviation of this distribution is 0.92 
which indicates variability in the subject’s performance due to some variable errors like changes 
in his motivation, interest or attention, etc. 

Besides these variable errors, the RL may also be atfected by two constant errors—the error of 

habituation and the error of anticipation, The error of habituation (sometimes called the error of 
perseverance) may be defined as a tendency of the su bject to go on saying ‘Yes’ in a descending 
series or ‘No’ in an ascending series. In other words, when the error of habituation is in operation, 
the subject falls into a habit of giving certain responses even after a clear change in the stimulus 
has occurred. One natural consequence ot this error is to inflate the mean of the ascending series 
over the mean of the descending series. The error of antic Ipation (sometimes also called the error 
of expectation) is the opposite of the error of habituation and ac« ordingly, may be defined as the 
tendency to expect a change from ‘Yes’ to ‘No’ in the descending series and ‘No’ to ‘Yes’ in the 
ascending series before the change in stimulus is a Pparent. The conscious tendency that works In 
the mind of the subject when such an error js in Operation, is that he has said ‘Yes’ many times 
and, therefore, should now say ‘No’ (in the descending series); likewise, he thinks that he has said 
‘No’ many times and, therefore, he should now say “Yes’ (in the ascending series). The natural 
consequence of such an error is to inflate the mean of the descending series over the mean of the 
ascending series. The primary purpose of giving alternate ascending and descending series 's t 
cancel out these two types of constant errors. Table 13.1 shows that the mean of the ascending 
series is 8.3, which is higher than the mean of the descending series (that is, 7.5). It means that the 
subject has committed the error of habituation though its magnitude is very small, that is, 0.8 
only. Since these two types of constant errors work jn opposite directions, both cannot exis! 
within the same subject. Practice and fatigue may affect the data obtained by the method of 
limits. These two effects may be shown by comparing the mean of the first half with the mean of 
the second half of the total series taken. Guilford (1954) has recommended that these effects can 
be analyzed in an even better way by the ANOVA (Analysis of variance). 


Scaling Techniques 321 
fable 13.2 Determination of the difference threshold for w 
(data are hypothetical) 
D A 


eightlifting by the method of limits 





Comparative Stimuli A D A Ta be D 


+ 


110 + 

109 + + 

108 

107 

106 
$, =105 

104 





103 = ia = 
102 = “ 

101 = = 

100 _ 


—————— en 
Upper Threshold: 107.5 106.5 106.5 105.5 106.5 106.5 106.5 105.5 107.5 105.5 
Lower Threshold: 105.5 104.5 104.5 103.5 104.5 104.5 104.5 103.5 105.5 104.5 


Mean: 106.5 105.5 105.5 104.5 105.5 105.5 105.5 104.5 106.5 105.5 


Mean upper threshold = — = 106.4 






Mean lower threshold = 10s = 104.5 
10 


Interval of uncertainty = 106.4 — 104.5 =1.9 


Difference limen = Interval of uncertainty = 19 —(0.95 
? 2 
Point of subjective equality = TOO EO 105.45 
2 
Constant error = PSE —S, = 105.45 -105 =0.45 
Mean ascending series = 105.9 


Mean descending series = 105.0 





Tal > . | 
Weightin '3.2 illustrates the calculation of DL of the data obtained in an experiment on 
§ : 


the danke (the data are hypothetical). The S, is set at 105 g and there are 10 C,, stimuli, 5 below 
SES in yy Move the S,. All C,, stimuli are of the same size and of the same texture. There are 10 
Seriag ich 5 are ascending and 5 are descending series. The first trial is of the ascending 
Subjec ‘ca the experimenter starts with a C,, of 100 g weight, which is much below the S,. The 
minus we i'to be lighter than the 5, and hence, the experimenter records it by putting a sign 
Nis way, the C , is increased by one gram weight in each presentation. 

and 3 ine Change in response by the subject from ‘lighter’ to ‘equal’ occurs at 106 g weight 

gly, this is noted down by putting the equality sign. The experimenter does not stop 





322 Tests, Mea 7 
surements and Research Methods in Behavioural Sciences 


here but continues to | toe 
‘equal’ to ‘heavier’ at 108 2 weight whi ge n a ions, he reports the change in respons, + 
den, The fire chaneain 5 ght whic | is noted by the experimenter by putting dowp te 
ee sere th ae , Res parse Horn the fignter to ‘equal’ is the lower threshold of chan : Diy 
ea é in fesponise from equal to ‘heavier’ is the upper threshold of change fc an 
ge “i ald 6 change is the midpoint of the two stimuli between which value change KER 
t — threstold is 107.5 (the midpoint of 107 and 108) and the lawerthashadicn 
emi point of 105 and 106). The same procedure is repeated in descending series ex 55 
the experimenter starts presenting the C,, from 110g, which is clearly judged as iesivied ee +s 
5 ane accordingly, noted by putting down a plus sign. The first change in response from ‘h ' Ping 
a equal’ in descending series (the second series) occurs at 106 g weight and the second se 
in response frorn ‘equal’ to ‘lighter’ occurs at 104 g weight. Accordingly, the upper thieciaic 


106.5 and the lower threshold is 104.5. The same procedure is repeated in other ascending ang 
2 shows that the entire range of Comparison values may be divide 
dominate, a lower part where the 
nts dominate. The middle 


and covers two DLs oy 


descending series. Table 13 
into three parts: an upper part where the plus judgements 
rninus judgements dominate, and a middle part where equal judgeme 
part where equal judgements dominate ts the interval of uncertainty (/U/) 
JINDs from minus to equal and from equal to plus in the as¢ ending series, and trom plus to equal 


and from equal to minus in the descending series. The DL is halt the /U, which ts 0.95 and IU is 
41) and the mean of the lower 


the difference between the mean of the upper threshold (106.: 

threshold (104.5). A DL of 0.95 indicates that the subject can correctly perceive the difference 
between the C,, and the S, if the former is increased or decreased by a margin of 0.95 g. The 
midpoint of IU (106.4 + 104.5/2 =105.45) becomes the point of subjective equality (PSE). The 
difference between the PSE and the 5, indicates the constant error (CE), which may be either 
positive or negative. ifthe PSE is above the S,. the CE is positive and if the PSE is below the S, the 
CE is negative. In Table 13.2 a positive CE of 0.45 has been obtained, which indicates that the 
5 gto the tune of 105.45 6. A 


subject on the average has overestimated the standard weight ot 10 
negative CL indicates underestimation ot the standard stimulus and reversely, the overestimation 
stimulus. The mean of ascending aleulated on the basis of the mean 


and the lower threshold of each an of the 
99. This difference indicates that the judgements Of the subject have been 
that extent. Since the difference ts negligible (that ts (0.9 


ts are not being influent ed ina signiicant manner. 


of variable » series, which ts ¢ 
of the upper ascending series, ts higher than the me 
descending series by 0. 
influenced by the error of habituation to 
only), ean be assumed that the judgemen 


Method of Constant Stimuli (also known as Method ot Right and Wrong Cases or Method 


of Frequency) 

yethod, a numbe 
ethod of const 
the ditterent values of the stimulus are presente 
ar increase or decrease as is done in the me 


of fixed or Constant stimull are | resented to the subject several times in 
ant stimuli can also be employed tor determining the KL ot 

d to the subject ina 
thod of 


In this n 
4 random order. The m 
DL. For determining RL, 
random order (and not in an order of regul 

limits), and he has to report each time whether he perceives OF does not perceive the stirnulus. 
Though the different values of stimulus are presented irregularly, the same values are presented 
throughout the experiment a large number of times, usually from 50 to 200 times each, ina 
predetermined order unknown to the subjects. The mean of the reported values of the stimulus 
hecomes the index of RL The procedure involved is known as the method of constant stimull. 
Table 13.3 illustrates the Computation of RL by this method. For calculating DL, in each 
presentation the two. stimuli (one standard and one variable) are presented to the subject 
<jmultaneously or in succession (Guilford 1954, 118). On each trial, the subject Is required to say 
whether one stimulus is ‘greater’ or less’ than the other. In case of uncertainty, he reports, 
‘Doubtful’ or ‘Equal’ and in such a situation sometimes the experimenter forces hirn to gues in 


> 


Scaling Techniques 323 


srder to avoid doubtful uagements: The procedure involved is known as the method of constant 
imullls differences and = the method of ¢ onstant stimuli. If the 5, and the C, are to be 
resented 1 SUCK capa se halt ol the trials, the 5S, 15 presented first and for the remaining half, the 
der i reversed. This is done to contro] aconstant error (that is, time error), which may occur if 
he 5; i resented either before or after the C,, throughout the trials. In this sense, the method of 
constant stimulus differences differs irom the method of limits where the C, are presented by 
regularly increasing (ascending series) and decreasing (descending series) their value. In the 
method of constant stimuli, a smaller value of the C o May be abruptly followed by a larger value 
of §, or vice versa. The subject cannot estimate the likely C,, to be given for making a judgment. 
Though the different C,, are presented inan irregular order, they remain constant throughout the 
presentations in the experiment, that is, the same value is presented throughout all presentations 
‘athe experiment in an irregu lar order. This method is also known as the method of right or wrong 
cases because In each case, the subject has to report whether he perceives the stimulus (right) or 
he does not perceive the stimulus (wrong). But this name is almost obsolete now. This method has 
one distinct advantage over the method of limits. Since in the method of limits the stimuli are 
presented in a regular increasing or decreasing manner, the two constant errors, that is, the error 
of habituation and the error of expectation, are inevitable. But these two errors are safely avoided 
in the constant methods because the presentations of the stimuli are random or irregular. 

Table 13.3 illustrates the determination of RL from the hypothetical data taken from a 
hypothetical subject. 

Table 13.3 Determination of the lower absolute threshold for sound intensity 


Stimulus value Frequencies showing the Percentage of times 
(Herztone) — number of times tone heard tone heard 
30 96 95 
48 80 80 
46 70 70 
44. 65 65 
4? 43 43 
40 30 30 
38 20 20 
36 15 15 
34 12 12 
32 3 3 


es ee a ee 


'n this hypothetical experiment which yielded the data of the Table 13.3, the subject was 
tind '0 sound intensities, each 100 times. He was to report whether or not he heard the 
heauen ‘oubtful judgements were not allowed. Against each sound intensity are given the 

(“es with which the subject reported the sound being heard clearly. (Table 13.3 shows 


Dies 


OMY the 

, e im . . : : Ts ‘Ale _ : 

| fetuene frequencies with which each sound intensity Was perceived). Subsequently, the 

stimulyc “S Were converted into the percentage. Since there were 100 judgements for each 

| ine the Intensity, the number of times the sound intensity was heard and the percentage of the 
| he. i = ; re ee eet ae : : 
"Nich | und was heard were identical. There are several methods tor calculating RL, out of 


29Dhica) three most common ones will be discussed here. These three methods are the 
Method, the linear interpolation method and the summation method. The graphical 


Fis, 


o£ POO, CEC ORS TERE EEOCEERS COREE AOCSCOUPOCH VMICTOOES EF} Behar 


ioural Sciences 
method is a rough method of calculating RL. 
Here the stimulus dimensions are located on 
the abscissa and the percentages showing the 
number of times the stimulus was perceived 
are located on the ordinate of the graph. As we 
know, RL is_ that point of the stimulus 
dimension which is perceived 50% of times 
and not perceived in the remaining 50% of 
times. Accordingly, we go from the 50% point 
on the ordinate across to the curve and then a 
straight line is dropped on the abscissa. The RL 
is assumed to be located on the point where 
the perpendicular line hits the abscissa (See ‘ 
Figure 13.1) The RL, thus found, is 42.60 
which is the point that produces response 


‘cue Fig. 13.1. Graphical determination of the she. 
(report ol hearing the sound) 50% of the time. threshold for sound intensity at 


100} 


co 
=) 


60 


40, 


Per cent of times tone heard 


bh 
S 





34.0—«=CO8Bt~<Ct« 


46 BF 
Stimulus value : 


The linear interpolation method is more precise and exact than the graphical method, jn: 
method, the median (or Q,) of the distributions of judgment is found by interpolation, W 
calculating KL trom the data given in Table 13.3, our primary interest is to know that stimuk: 
value which cuts the distributions of judgements into two equal halves so that above that pC , 
we get half of the increasing frequencies of the sound being heard, and below which we get the 
decreasing frequencies of the sound being heard. An inspection of Table 13.3 reveals that t 
point falls in between the stimulus value of 42 (43%) and 44 (65%). The difference between €: 
and 43 is 22 and if we go 7 points above the stimulus values of 43, we get 50% because 43 plus” 
is 50. By interpolation, then, we calculate (7/22 ) x2, because there are two step intervals betwee 
the stimulus value of 42 and 44. The resulting value is 0.636 which, if added to 42, becom= 
42.636 or 42.64. Thus the RL by the linear interpolation method is 42.64, which § 
approximately equal to 42.60 found by the graphical method. The method assumes a straight li 
(or linear) relationship between the two stimulus values used in the interpolation and hence, !t': 
called the method of linear interpolation. | 
The linear interpolation method has been criticized on some grounds. First, the assumptit 
of the method that there exists a linear relationship between the two stimulus values has been 
controversial one. Second, the method does not fully utilize all data in computation ¢ 
median point. In the above example, only two of ten judgements, namely, 43% and = 4A 
been used. Third, no correct estimate of dispersion of judgements can be made in this i aie 
a consequence, the dependability upon RLthus calculated is poor. Guilford (1 954) ine F ‘oust 
proposed an indirect way of calculating the standard deviation (a measure of Janes 4, The 
Q or the semi-interquartile range from data obtained by the method of ss 75(0)-T 
step consists In finding the stimulus value corresponding to p = 025 (or Q)) al) ale range ant 
one half of difference between Q, and Q, becomes Q or the semr-interque . 
assuming the distribution to be a normal one, SD = 14833 xQ. | 4 of ¢ 
The summation method proposed by Woodworth (1938) 1s apelsier alte 
RL. This method is most suited where the terminal proportions can = aa 1.00 In the metho" 
when it can be assumed that the next terminal proportions are close to alt h a dispersion 0" 
of summation, the mean (which shows the RL) and the 0) (cick ene 
judgements) can be estimated by the equations given below: 


alculati 
1,00° 


M =S, -0.5/—1sum p 
M =S,+0.5i+/ sumd 


Sealing Techniques 325 


~ stimulus value where p = 100 lin case 
| ase PIS NOL TOO, it can be assumed at the 
stimulus value to be so): on 
j = step interval of the stimulus value: 
p= proportions ol Judgement in higher or Superior judgement category 


d =1-— plor proportions of judgements in lower or inferior judgment cat 
s, = stimulus value where d - 


epory): 
= Olin case, dis not 0, it can be assumed at the next stimulus 
value to be so). 

A close inspection of equations 13.2 and 13.3 reveals that Equation 13.2 estimates the mean 
on the basis of pand Equation 13.3 estimates the mean on the basis of d. Since pand dare simply 
nwo ways of expressing the same proportion, the mean calculated by the two methods should be 
exactly the same. Some investigators have preferred to write Equation 13.2 as Mean = apical 
value -i/2 —/ sum p and Equation 13,3 as Mean = basal value +i/2 + i sum d, where apical value is 
defined as that stimulus value where all responses given by the subject belong to the same 
category and below which the responses are divided into categories such as ‘Heavier’ or 
‘Lighter’, ‘Heard’, or ‘Not heard’, ‘Yes’ or ‘No’, etc., and the basal value is that stimulus value 
where all responses given by the subject belong to the same category and above which they are 
divided into different categories. The meaning of the rest of the symbols is as usual. When mean 
is calculated by Equation 13.2, the SD should be calculated by Equation 13.4 where 





SD = 1/2 sum cp —(sum p)’ - sum p (13.4) 


When mean is calculated by Equation 13.3, the SD should be calculated by Equation 13.5: 





SD = ix)2 sum cd —(sum d)? - sum d (13.5) 


In Equation 13.4 the meaning of the symbols are defined as in Equation 13.2 except cp 
which has proportions p cumulated from low to high values of p, and similarly, in Equation 13.5 
the symbols are defined as usual except cd which has proportions d cumulated from low to high 
values of d. Table 13.4 illustrates the calculation of RL by the summation method. 


Table 13.4 Determination of lower absolute threshold for sound intensity by the summation 
method (data are replicated from Table 13.3) 





Stimulus value —_p (proportion of  d (proportion of | cp ed 

__(Herztone) __time heard) time not heard) | 
50 0.98 0.02 4.36 0.02 
46 0.80 0.20 3.38 0.22 
46 0.70 0.30 2.58 0.52 
" 0.65 0.35 1.88 0.87 
42 0.43 0.57 1.23 1.44 
40 0.30 0.70 0.80 2.14 
58 0.20 0.80 0.50 294 
46 0.15 0.85 0.30 179 
34 0.12 0.88 0.15 4.67 








326 Tests, Measurements and Research Methods in Behat woural Sciences 


M =S,+0.5i-—isumd 


5, = 52 (assumed), Sp = 30(assumed) ) 
M = S, -0.5/-—/. sum p = 30 + (0.5)(2)+ (2)(5.64) 
= 52 -(0.5)(2 )—(2 )(4.36) =30+ 14 1128 
= 52-1-8./72 = 42.28 
~aehee Similarly, 
Similarly, 
M =basal value + i/2 + sumd 


= 30+ 2/2 + (2)(5.64) 
= 30+ 1+1128 
= 42286 


M =apical value—i/2—i/ sum p 
= 52 —2/2 -2(436) 
= 52 —1-86/2 
= 42.28 

SD = iJ2 sum cp —(sum p)?— sum p =212(1521)-(436)" —436 = 531 





SD = iy/2 sum cd —(sumd)*—sumd = 2/2 (22.25)-(5.64)* — 5.64 =531 


It is obvious that the RL calculated by the summation method is very close (except the 
fractional differences which may be due to the rounding errors) to RL calculated by the graphical 
method (42.60) and the linear interpolation method (42.64), and this checks the accuracy of 
the calculation, 

The calculation of DL may also be done through the graphical method, the linear 
interpolation method and the summation method. Suppose the experimenter wants to determine 
the DL for lifted weights in which three categories of judgements (heavier, lighter and equal) are 
obtained. Let the S, be 100 g and the C,, be 70 g, 80 g, 90g, 100g, 110 g, 120 g and 130g. Each 
of these C,,s is paired with S, 100 times in a random order and the subject is instructed to judge 
whether the second stimulus is heavier, lighter or equal to the first one. The resulting frequencies 
and proportions of the judgements have been shown in Table 13.5. (The experiment is 


hypothetical.) 
Table 13.5 Frequencies and proportions of three categories of judgements in a weightlifting 


experiment where standard stimulus is 100 g 
Stimulus Frequencies Proportions | > 
d Lighter 


Equal Lighter Heavier m Equal 





values __Heavier 

130 90 8 2 0.90 0.08 0.02 
120 84 12 4 0.84 0.12 0.04 
110 78 18 4 0.78 0.18 0.04 
5100 40 50 10 0.40 0.50 0.10 
90 35 20 45 0.35 0.20 0.49 
80 10 10 80 0.10 0.10 0.80 
0.99 





7 : 99 0 0.01 
S, = 60 (assumed) = 9.97 1.19 2,44 


-hown If 
has been show" 
nsitivily: © 
ychomell™ 





S 


1 = 140 (assumed) 








inne of the judgements—‘heavier’, ‘equal’ and ‘lighter’ | 

eGigiae ne may be considered as a typical distribution in the field of Sef 

lunctio k so ie graphical method, two curves (technically called the ps 
nN have been drawn in Figure 13.3. 





Scaling Techniques 327 






1.0 
\c Lighter Heavier 

4 0.8 
d 
3 \ 
3 0.6 
O 
5 
= 04 
fs) 
a 
ae 


o 
ho 
~ 

a. 

a : 





a) 

2 
~ 
> 


AK 
V "70 80 90 100 110 120 130 
Comparable stimulus value 


Fig. 13.2 Distribution of three categories of judgement made by subjects 


One psychometric function or curve is for the judgements of ‘heavier’, and another 
psychometric function or curve is for the judgements of ‘lighter’. 

These two curves are shown as ascending functions of the proportions of the two types of the 
judgements and divide the areas representing the three categories of judgements into three 
separate regions (see Figure 13.3). It is obvious from Figure 13.3 that the upper threshold or limen 
(L,)is 102.60 and the lower threshold or limen (1, ) is 88.60. 


1.0 ~~? 
eh —$<—o 
o~ Fa) 
/ Bog = 
£08 / x 
“4 i / 
Ee j i 
& / 
a ae = i F) 
= uF Lighter SE ual / Heavier 
oO = a ) | 
5 { 
~ 04 a 
Oo 
OL 
Oo 
0.2 ) | 
oO Pe 
o 





00° V"F9 80 90 100 110 120 130 


Comparable stimulus value 
| i sment which divide the 
Fig. 13.3 Areas representing the three categories of judgement which divide 
transition zones into three categories 
uncertainty or /U and therefore, DL is 


These 4 ee ral of me 
ee of PSE is (102.6 + 88.6)2 = 95.6. Similarly, 


02.608 nani: ) 
bua... 2860)2 =7 and the point of subjective equality vein. 
ar interpolation wate the upper limen which is expected to fall between the stimulus 


Value 19 ge be calculated like this: The 
0 ; | PP lies therein), can 
Uiffere and 110 (because the proportion se of these two limits of stimulus value is 0.38 


| NCE in the ions of ‘heavier response a. Hi 
Nd if 0.19 ; preportions oF neaviet sr 1() = 2.63 which is added to 100 to HH) 
4 | | | 0.38) x1 0 o5 pe ; 
Yield 19 0 is added to 0.40, we get 0.50. Hence, (0.1 0/ ower limen (l,), that is, of ‘lighter 

d 80. By interpolation, we get 


“sponse o3, the upper limen (L,) we require. Leones ; - 
05935 iS expected to fall in between the stimulus va 5058, the value of the lower limen we 


x10 = 1.42 which if subtracted from 90 leaves 


~ 


Mererrements and Keseanh Methods in Bebavioural Sciences 
—_ dyer 
ww ee 


require. Then DL is ~ (102.63 ~ 88.58)2 = 7025 and PSE s = (102.63 + 98.5692 = 95.605, They, 
values are approximately equal to those found by the graphical method and hence. chee, th. 
accuracy of the calculation, The DI can aleo be calculated hy Woodworth’s SUMMation Meth, 
The upper limen by Equation 13.2 is 140 — (0.5) (10) '10)(3.37) = 1013 and the lower limen : 
Equation 13.3, 1 equal to 60 4 (0.5)(10) « (10)(2.44) = 89.4, The DL is (101.3 - 89.4) /2 ~ 09 a 
and the PSE is (101.3 4 89.4)/2 = 9034. The DL can also he estimated from the sum of 
proportions of ‘equal’ or ‘doubtful’ judgements. Here. [LU = ; sum mand DL. as Usual, js 
In Table 13.5, 1U is 10% 119 =] 1.9, the DL being half of this which become 


obvious that some minor variations in the value of DL and PSE occur when d 
calculation are adopted, 


The DL can also be estimated from the two-cate 
weightlifting experiment was carried on a new subject who gave his judgement with respect », 
the second stimulus as being ‘heavier’ or lighter’ only. Sometimes, it happens that even when he 
subject is forced to Bive a two-category judgement, he reports ‘doubtful’ or ‘equal’ responses |, 
such a situation, the ideal approach is to divide the number of ‘doubtful’ or ‘equal’ FeSDON se 
equally between the two Categories such as ‘heavier’ or ‘lighter’ at each stimuylys Value, The 
rationale behind this is that the subject gives ‘doubtful’ or ‘equal’ responses Only in situation. 
which are equally inclined towards ‘heavier’ or ‘lighter’. Others however, reject this view and 
preter to distribute the doubtful judgements in proportion to the numbers of judgements alreach, 
made towards the two Categories, This is known as the proportional division of ‘equal’ oy 
‘doubtful’ judgements. In this experiment, let us suppose that no ‘doubtful’ or ‘equal’ reSPOnse< 
were made by the subject and hence, the distribution of the frequencies and Proportions resulting 
trom 100 pair presentations of each C » With S, was as given in Table 13.6. . 


Table 13.6 Distribution of the frequencies and Proportions of two-category judgements in a 
weightlifting experiment where the standard stimulus is 100 g 


the 


erent Method. -. 
5 Of 


gory judgement. SUPPOSE the $3 







Stimulus 











| Heavier 
130 98 2 98 2 
120 85 15 85 15 
110 70 30 70 30 
S 100 62 38 62 38 
90 40 60 40 60 
80 10 90 10 90 
70 1 99 . 99 


The reason why experimenters, in general, prefer two-category judgements to thee 


Category judgements is due to certain advantages of the former over the latter (Postman _ 
1949). The first advantage is that in the two-category judgement, where the subject is gigs 
guess by restricting his judgement to either ‘greater’ or ‘less’, he will be more often right 
wrong in his guessing. 

The second advantage is that when a neutral 
Calegory becomes vague and confusi ng. Noto 
subject. When the subject is forced to Rive 


, ning of the 
Category like ‘equal’ is used, the pee ‘ 
nly this, but its meaning also varies rom = ~ 
his judgement in either of the two categories. 


— 


valing Technique: 329 


| ) ymatically removed tr “ae At | : yozested 
wn ond ¥ s are automatically iv i view OA these advantages, fs suggeste 
ape thsshon hat the two-category judgement be used 


" 4! 
in genet 
10; P P 
= bearer / 
s 08 ; er 
A 
oS = j 
z ; 
— Wf f & 
s | / 7 
£ / 
6.2 - f 
: f - 
0.0 | ral = : oe = 
' 76 86 8 105 110 ia i 


Fig. 13.4 Distribution of two-category judgement made by subyects 


in computing DL from a two-category judgement by the graphical method and the linear 
interpolation method, the statistical treatments are different from those used in the three-category 
wdgerment. Here the linear interpolation method involves the calculation of Q, and 
0. Q,andQ, will be the lower difference threshold and the upper difference threshold 
respectively. The range between Q, and Q, is called the interquartile range and is equivalent to 
half of /U. the range between Q, and Q,, if divided by 2. yields the value of Q (or the 





i. As DLIs 
semi-interquartile range) or the difference threshold. Q, (or 25th percentile) from the data of Table 
13.6 may be calculated like this: 
| Sas 15 
Q. =80+10 23-10 _90+10— =80+1005)=85 
40-10 30 





Q, =110= 1022? =110+ 102 =110 = 100333) = 11333 
85-70 15 

Here, 110 is the stimulus value of the step above which Q, (or the 75th per cent) falls; 10 is 
the step interval; 75 is the per cent required for Q, ;85 is the per cent for that stimulus value where 
Q, lies; and 70 is the per cent for that stimulus value above which Q, lies. Asimilar interpretation 
applicable for those values on the basis of which Q, is calculated. Now DL and PSE can be 
caiculated as usual: 

py = UPPEF threshold —lower threshold - IU 


2 2 
1133365 _ 2833 _ 4166 
2 2 
psp — UPPer threshold + lower threshold _ | aS +85 _ 99.165 
2 2 


dane DL by the graphical method, is calculated by drawing a freehand smoothed curve 

ine Passes close to the data points, leaving some of them above and some of them below the 

isabel dropping two perpendiculars at the abscissa from 25% (or 0.25) and 75% (0.75) points 

distributic, the y-axis and joined by two separate horizontal lines (see Figure 13.5). The 
won of the judgements ‘heavier’ and ‘lighter’ has been shown in Figure 13.4. 


a 


a 


330 


Tests. M, 
» HeBUFeMenis and 
ta Research Methods in Behari 
TCD Methods in Behariounal Sciences 


It is obvious f . 
at 113.4 g, animate that the perpendicular dropped from 75% touches the abscic., 
from 25% touches the absci upper difference threshold (L,) and the perpendicular droppe, 
eal, He BA = <1 re at 84.6g, which becomes the lower difference threshold (L), 4. 
Sided ons me Be = | 13.4-846)2 = 144 and the PSE =(113.4+ 84.6)2 = 99. The valucg 
EY ; at y the graphical method are very close to those obtained by the lines 

terpolation method. Sometimes the perpendicular dropped trom 50% falls exactly on the value 
of the standard stimulus at the abscissa (which is not the case In Fig. 1 3.5). In such a situation, the 
constant error is zero. Thus the perpendicular dropped trom 50% at abscissa indicates the value 
of PSE. \f the constant error is positive, the perpendicular dropped from 50% will lie to the right o: 
the value of the standard stimulus shown at abscissa and if the constant error is Negative, the 
perpendicular dropped trom 50% will lie to the left of the value of the standard stimulus at the 
abscissa. In this example, the constant error is equal to —1. Hence, the perpendicular drawn trom 
50% lies to the left of the value of standard stimulus shown at abscissa. 


'Q3 = 113.4 





| 
Q> or Mdn = 99 


Per cent of tires comparable judged heavier 


= \_f wre | 
a9 \ 7) 80 90 100 110 120 130 


Comparable stimulus value 


Fig. 13.5 Calculation of DL and PSE by the method of constant stimuli 


Method of Average Error (also known as Method of Adjustment, Method of Reproduction 


or Method of Equivalent Stimuli) 
The method of average error is the oldest method of psychophysics and is a sort of gift to 
psychophysics and astronomy. In this method, the subject is provided with an S, and aC,,. The 
C., is either greater or lesser in intensity than the S,. He is required to adjust the C,, until it appears 


to him to be equivalent to the S,. 


The difference between S, and C, defines the error in each judgement. A large number of 


such judgements are obtained and the arithmetic mean (or average) of those judgements 
calculated. Hence, the name, ‘method of average error or mean error’ is given. The obtained 
wean is the value for PSE. The difference between the S, and PSE indicates the presence of the 
constant error, or CE. Equation 13.1 may thus be applied tor calculating the CE. If the PSE (or 
average adjustment) is larger than the S,, CE is positive and indicates overestimation ot the 
standard stimulus. On the other hand, if the PSE is smaller than the 5,, CE is negative and 
indicates underestimation of the sta ndard stimulus. 

The method of average error is distinguished from the two preceding methods in one 


In the preceding two methods, the control over changes in the stimulus Was 


important way. 
But in the method of average error 


entirely in the hands of the investigator or the experimenter. 


- 


scaling Techrigues 331 


pects snemselves are permitted to control the variations or changes in the stimulus. In using 
neon 


— of average error, Care must be taken to see that the probahility of systematic or 
mes ‘ wan as well ass ariable error is minimised. This can be ensured by the following means: 
« tn half of the total trials the C., should be set at a value larger than the S, and in the 
— nali. the C, should be set at a value smaller than the S.. In this way, the direction of 
“a a: should be counterbalanced so that movement error may be minimized or cancelled. 


<p movement error Is the error produced by the subject’s bias for moving the comparable 


seraslus inward OF outward. 

. 3. The spatial preseniation of the C,, and the S, may also result in a systematic error called 
anace EITOr - The space error is deiined as the error which is produced by the subject's bias in 
adussting the C_ with 5, when the jormer is placed either to the left or right of the latter in all the 
ale Hence, for controlling space error, the C,, should be presented by placing it to the right of 
‘os in hali of the total trials and reversing its position in the remaining hall. 

3. The initial value of the C,, should be randomly changed from trial to trial so that the 
aubiect may not get any unnecessary cues known as ‘extraneous Cues’ in adjusting or equating 
the C,, to the 5.- 

The method of average error is also known by several other names. In this method, the 
subjects adjust the C, to the S. by making an active manipulation of the C. Hence, the method is 
2p known as the ‘method of adjustment’. The purpose of the method is to determine equivalent 
zimuli by active adjustment of the C, by the subject in each 
trial and hence, the method also goes by the name of ‘method 5 Zz a= 
of equivalent stimuli’. In this method, the subject tries to 


= 


reproduce a given C._ in a way which may seem equivalent Fig. 13.6 Muller-Lyer illusion 
to the S$. Hence. it is also known as the ‘method of 
reproduction’. 


The main purpose of this method is to calculate PSE, although DL can also be calculated. The 
method will be illustrated with data obtained from the experiment on the Muller-Lyer illusion 
data are hypothetical). The MullerLyer illusion has been propounded by a German, 
Cciologist-cum-psychiatrist Franz Carl Muller-Lyer (1857-1916). A figure producing the 
tuller-Lyer illusion is given itn Figure 13.6. 

: in Figure 13.6, line Z is known as the feather-headed line and line X is known as the 
ni sarin line. Commonly, the arrow-headed line ts the fixed or standard stimulus and the 
—a line is the comparable or variable stimulus. The experimenter sets the variable 
arrow-he ats, the feather-headed line in a way that it appears clearly shorter or longer than the 

“headed line. The task of the subject is to adjust the feather-headed line until it appears 
ling eae Equal to the arrow-headed line. When the experimenter sets the feather-headed 
Make jt €r than the standard arrow-headed line. the subject must move it outward in order to 
the bctie I to the standard. This is called the outward movement. When the experimenter sets 
'S called in €aded line longer than the arrow-headed line, the subject must move It inward. This 

% ward movement. 
committe ‘iat ign on the Muller-Lyer illusion, there are two common constant errors 
When the wie the subject. They are the movement error and space error. Movement error occurs | 
Oty ject has a certain bias for one of the two movements, that is, inward movement and | 
Stroy ae which unduly helps him in making the feather-headed line equal to the 
0 aces “d line. Simi larly, space error results when the subject has a certain bias for one of the | 
line. ¢ " in the Visual field, namely, right and left, which helps in adjusting the teather-headed | 
he right * cupiects may get facilitation in adjusting the feather-headed line when it is placed in | 
“there cae” and some subjects may get the same type of facilitation when the 

ine is placed in the left visual field. Both these facilitations tend to produce a 











oN 


332 Tests, Measurements and Research Methods in Behavioural Sciences 


space error. The movement error and the space error should be controlled in the ex 
One convenient method of controlling the two types of constant errors is to 
counterbalancing design in which an equal number of settings or trials will be given for both 1" 
types of movements (inward movement and outward movement) and for both the types of op the 
(right visual field and left visual field). Let O stand for outward movement, | stand for me 
movement, L stand for left visual field (where the feather-headed line is placed left 4 Ne 
arrow-headed line) and R stand for right visual field (where the feather-headed line is placed , : 
of the arrow-headed line). Following the counterbalancing design, a set of the 64 trials ma : 
distributed as follows: ay De 


Use 


Order: A B B A 
a R R L 
OlWOo ONO Oo OO 
(16 trials) (16 trials) (16 trials) (16 trials) 


The raw data (hypothetical) are presented in Table 13.7. 


Table 13.7 Judgements (in MM) of the length of the feather-headed line made by a single 
subject in 64 trials—calculation of the extent of illusion shown 





Trial Direction Length Trial Direction Length Trial Direction Length Trial Direction Length 


judged : judged judged judged 
1 LO 48 17 RO 45 33 RO 49 


3] LO 48 
2 LI 47 18 RI 43 34 RI 50 50 L| 49 
3 LI 45 19 RI 46 35 ix | 50 51 LI 47 
4 LO 47 20 RO 45 36 RO) 48 52 LO 46 
5 LO 46 2) RO 46 $/ KO) 49 33 LQ 46 
6 LI 45 22 | 48 38 RI 47 54 LI 47 
4 LI 44 25 | 43 39 KI 46 55 LI 4s 
& LO 5] 24 RO 44 40 RO 46 56 LO 49 
9 LO 53 25 RO) 45 4 | RO 45 57 LO 50 
10 L| 49 26 KI 48 42 | 43 5&8 LI 51 
11 L| 48 27 RI 4/7 43 K| 44 49 L| 52 
tz LO 4/ 28 KO 47 +4 KO) 4/ 60 LC) 50 
13 LO 46 29 RO) 46 45 RO 46 6] LO 49 
14 LI 47 30 RI 48 46 RI 43 62 || 48 
15 L| 45 31 RI 49 47 RI 45 63 LI 45 
16 LO 44. 32 RO 50 48 RO 46 64 lO 44 

Mean of LO Series = 47.75 Mean of L/ Series = 47.31 

Mean of RO Series = 46.62 Mean of RI Series = 46.25 

(Mean of LO+ Meanof LI) (Meanof RO + Mean of RI) 
Space error = jl “eee: wi 
og APS APS) MADE 1 4029) _ a7 oy) .06.495 = 1 


2 2 





Scaling Techniques 333 


(Mean of LO + Mean of RO) (Mean ot Ll+ Mean of Ri) 


} 9 


ari = 
ovement errol 


(47.75 + 46.62 ) (4731+ 46.25) 


: ) 


2 2 


= 47.185 — 46.780 =0.405 


ements OF PSE = 46.98, SD ot 64 judgements = 3 315 


jean) ot 4 judg 
avi the extent of illusion): PSE Standard stimulus = 46.98 — 50 = —3.02 
4 yl i! 





obvious from Table 13.7 that PSE is 46.98. Since the length of standard stimulus (the 
e) is 50 mm, the CE ts —3.02. This value of CE indicates the underestimation of 
In other words, the subject, on the average, perceives the 
he feather-headed line to the extent of 3.02 mm. Thus the 
hypothetical data of Table 13.7 is 3.02. The evidence for 
as also been found, although these values are not large. 
095 and the movement error is 0.405. From the data it is obvious that the 
which facilitates the adjustment of the feather-headed line when it is 


than the right of the standard arrow-headed line. Similarly, the subject 
han towards inward movement in 


It is 
headed lin 
ard arrow-headea line. 


line to be shorter than t 
juller-Lyer illusion in the 
error and the movement error h 


arrow 
ihe stand 
yrow-headed 
extent of the N 
the space 
The space error is | 
subject has some bias, 


jced to the left rather 


pI 
to be more biased towards outward movement t 


appears 
adjusting the feather-headed line. 


WEBER’S LAW AND FECHNER’S LAW 

discoverer EH Weber, was the first systematic attempt to formulate a 
een psychological experience and physical 
cus of psychophysical experimentation. The 
noticeable 


Weber's law, named after its 
principle which governed the relationship betw 
stimulus. For a long time, this law has been the fo 
question that worried experimenters earlier was: Is the difference limen or JND (just 
discrimination) a fixed value for any given sense modality? This problem was thoroughly studied 
under the leadership of Weber who came to the conclusion that the JND was not a fixed value, 
rather it increased with the size of the standard stimulus in a linear fashion. In other words, as the 
— of the standard stimulus is increased, the size of change needed for discrimination 
am standard and the comparable stimulus (that Is, JND) is also increased. Thus the 
‘* ane magnitude of the standard stimulus, the greater the size of the JND or DL Weber slaw 
sat mathematical statement of this fact. This relationship between the size of the 
sition — and the size of JND Is technically known as Weber's Law. For example, if 
candles one candle makes a just noticeable difference to an already lighted room having ten 
candles i take 10 candles to make the same difference In a lighted room having 100 
iOiceable difere =" lighted with 1000 candles, 100 candles will pe = os 
“imulus. In the ae Thus, it is obvious that J/ND Beas a constant be with ” i. 
“Mension, the Dis s of Underwood (1966, 165 ) The law states at lOfed Byer _ us 

! ears aconstant ratio to the paint on the dimension (standard stimulus) at 


whi 
ih the DL Was measured.” The law may be stated in terms of the following equation: 
ARK _k (13.6) 
Me AR Dip -, | : 
nities = standard stimulus; and K =constant. 
3.6 may be verbally expressed as shown below: 
(13.7) 


DL __- = constant 


Standard stimulus 


~~ 








Pm 2 





334 Tests. Measurements and Research Methods in Behavioural Sciences 


The constant in Weber's law is always a fraction and is known as the Weber fraction, 
proportion. it indicates the proportion by which the standard stimulus must be increased mk ed 
to produce the just noticeable difference or to detect a change. If the addition of g g mek 
difference in weight sensitivity of 10 g, the Weber fraction will be 8/10 = 0.8, which indica; = 
the intensity of the standard stimulus must be increased by 0.8 in order to perceive a differ 
between that stimulus and the other stimulus. If the Weber fraction is 0.8, a stimulus Value of <.. 
100 will be increased by 0.8 of its magnitude, that is by 80, that is, (08 « 100)to be just noticeabh 
ditferent from 100. In other words, it should be 180. Likewise. if the stimulus value is 1000, i 
value should be increased by 800 in order to produce the just noticeable difference. Thus the 
ratio remains constant irrespective of the strength of the standard stimulus. 

Weber's law, in general, has been regarded as a good measure of the overall SENSitivity jp 
different sense modalities. Generally, if the Weber fraction is larger, the DLs tor the given Stimulus 
dimensions will also be larger. One advantage of the Weber fraction 1s its direct Comparability, 
Since the fraction is not dependent upon a physical unit in terms of which the standard stimulus 
and the DL are measured, the fractions can be compared across the ditferent stimulus dimensions, 
Weber's fraction has been commonly reported to be 0.020 for heaviness, 0.030 for line length, 
0.079 for brightness, 0.023 for finger span, 0.014 for electric shock, and 0.084 for a salty taste. 

One general difficulty with Weber's law is that its precision 1s lost Where the standard 
stimulus reaches the extremes, that is, when the standard stimulus becomes either very weak or 
very strong, the precision is lost to a great extent. Not only that, the Weber fraction is also 
influenced by the way the stimuli are presented (Ono 1979). When the presentation is such that 
the standard stimulus is followed by the comparable stimulus, the traction reaches its maximum 
precision. When the order is reversed or modified, its precision ts adversely affected. For these 
reasons, several alternative laws have been derived for fitting the tunction of DL and the standard 
stimulus. The reader is referred to Guilford (1954) for a detailed discussion of these laws. 

Fechner derived his law (called the Fechner Law) from Weber's Law. Fechner’s law is an 
indirect method of scaling’ judgement where DL is used as the unit of the equal-interval scale, 
Fechner was of the view that DI for each successive unit or psychological step can be determined 
by using a constant multiple. For example, suppose the Weber fraction is 0,25 (or 1/4) tora 
particular sensation. If one stimulus value is 20 units the other stimulus should be 1/4 of 20.0F 
(0.25 *20)=5 units more than 20, that is, it should be 25 for producing a just noticeable 
difference. Again the other stimulus value at the next psychological step to be just noticeably 
different from 25 units should be 025 «25 = 625 more than 25 units, that is, it should be 31.25. 

Likewise, at the next psychological step the other stimulus value should be $125 + 781= 49,06 in 
order to produce the just noticeable difference trom the stimulus value of 31.25. Thus the 
stimulus value required for each successive step or psychological step in order to produce a just 
noticeable difference should be 1', times the preceding one. In this way for each successive unil, 
large increments in stimulus value are needed to produce equal increments in psychological 
sensations. This increase in psychological sensation as a function of increment in stimulus value 
can also be easily described by a logarithmic relationship because it entails multiplication by 4 
constant. For example, in the above example for each successive step, the Di can also be foune 
by using a constant multiple, 5/4, such as for the first step starting with the stimulus value of 20,1 
would be %, «20 = 25; for the second step it would be % 225 = 3125; and for the third step: 


4 


* On the basis of the method adopted for obtaining judgments there can be twe types of scaling —direct scaling and indired 
scaling. Direct scaling is one where direct method for obtaining judgment regarding psychological quantities Of the 
interval or the ratio scale or the ordinal scales are adopted. In such methods, the experimenter clearly state 
quantitative properties desired by the subject in the instruction. He ray, for example, give instruction to judge by now 
many grams the C ,, appears heavier or lighter than the §, . Indirect scaling is one where no such direct methods of obtain 
judgments are available and, usually, the equal intervals of judgments are derived from the same proportions OF ratios: 
Hence, indirect scales can operationally be called Confusion scales (K ling and Riggs 1984), 





— 





Scaling Techniques 335 


jnother stimulus i eee “i — be % x31.25 =3906, and so on. It is obvious that 
ncrements in the stimulus value occur by a process of multiplication but increments in the 
euulting psychological sensation at each successive step occur by the process of addition. The 
iormer type of increment e known 5 geometrical progression’ and the latter type of increment is 
known as ‘arithmetical Progression. When one ot the two variables increases in geometrical 
rogression and the ollie! in arithmetical progression, the relationship is termed as a ‘logarithmic 
relationship’. Fechner's law states that the stimulus values and the resulting psychological 
sensation have a logarithmic relationship so that when the former increases in geometrical 
progression, the latter increases in arithmetical progression. In other words, the law states that the 
magnitude of sensation (or response) varies directly with the logarithm of the stimulus value. This 
aw of Fechner has been paraphrased by one author, “ The sensation plods along step by step 
yhile the stimulus leaps ahead by ratios.” (Woodworth 1938). Fechner has given several 
as for showing this relationship but the most common is: 


R=K logs (13.8) 


here, R = magnitude of sensation of response; K = Weber's constant and S = the magnitude of 


formu! 


W 
stimulus value. 

in formulating his above law, Fechner made two important assumptions. 

1. The DLor JND indicates equal increments in psychological sensation irrespective of the 

absolute level at which it is produced. 

2. Psychological sensation is the sum of all those JND steps, which come before its origin. 

Fechner’s law had been very influential, particularly in psychology’s early days, because the 
law showed that it was possible to relate the things of the mind to those of the body in precise, 
quantitative terms. But later investigators raised some objections and posed the question: Is it 
really true that sensory magnitude can only be measured indirectly as Fechner had claimed? 


STEVENS’ POWER LAW 
One of the important Harvard psychologists, S S Stevens put forward the argument that sensory 
magnitude can be assessed by a straightforward method and for it, indirect method, as devised by 
Fechner, is not the only method. Stevens adopted a method called Method of Magnitude 
Estimation in which sensory magnitude could be directly assessed, He presented his subjects 
With a series of stimuli to which they had to assign numbers which were proportional to the 
“Orresponding subjective impressions of the subject. For example, if a tone appears three times 
‘oader than another, the subject was required to assign a number to the first that was three times 
larger than the number given to the second, Suppose, for the second tone he has assigned 4, then 
* Would assign 12 for the first tone. 
There Were two major findings of Stevens’ work as given below. 
The subjects had little or no trouble in performing the task. They 


Own subjective experiences, thereby, producing 4 direct scal 
Magnitude. 


can readily judge their 
e of their subjective 
iil) This second important finding was the particular relationship between this direct scale 
and physical intensity. According to Stevens, the function was not logarithmic as 
rechner had proposed, rather it was a power function. In terms of equation, It was 


(13.8a) 
thee. ¢ S =k 
= subjective magnitude; 
= Stimulus intensity; 
Sand N ar € constants. 


> 


336 = Jests, Measurements and Research Methods in Behavioural Sciences 


This exponential function is called the Power Law or alter its popularizer, Stevens’ p 

- H i r | ss 

aw, the actual rate of sensation change with changes In stimulus inten 2 
Ly 


Law. According to this | 
r words, the intensity of a sensation 
i 


depends upon the size of the exponent, that is, N. In othe 
proportional to stimulus intensity raised to a certain power. 
According to Stevens, when the subjective magnitude is plotted against physical magnitude 

we find that the curves for power functions with different exponents (N) have dramaticalh 
different shapes as shown in Figure 13.7. If the exponent is less than | (such as visual brightnes 
for which N=033) the curves are concave downward, which means that as the stimuly: 
becomes more intense, greater stimulus changes are needed to produce the same degree of 
perceptual change. The result is that the sensation grows more slowly than stimulus intensity ang 
in this respect, it is similar to Fechner's position, that is, a wide range of stimulus intensities js 
compressed into a small span of subjective values. When exponents are greater than 1 (such as 
for electric shock for which N = 3.5 ), the curve becomes concave upward which means that as 

ange produces an even bigger 


stimuli become more intense, the same physical stimulus ch 
perceptual change than at lower stimulus intensities. The result is that a small range of physical 


intensities is expanded into a wider range of subjective values. Here sensation grows more 
rapidly than stimulus intensity. When exponent (N) is exactly 1 as we find in case of apparent 
length of lines, the function that relates subjective and physical magnitude is a straight line and 


there is neither compression nor expansion. 
Electric shock 


(Exponent > 1) 


y 









o& 
So 


/____ Apparent length 
(Exponent = 1) 


b cn 
o oS 


to 
S 


Loudness 
Exponent < 1) 


ho 
| 


Average magnitude estimates 


© 


10 20 30 40 50 60 70 80 90 100 


Stimulus magnitude 
Fig. 13.7 
echner’s law and Stevens’ law, one will lean in 
tled. In fact, Stevens extended Fechner’s effort 
ts to deal with sensations head-on and scale 


0 


d to judge the relative merits of F 
yet the issue remains unset 
as possible for the subjec 


if one is aske 
favour of Stevens’ law, 
and demonstrated that it w 


them directly. 
Here the primary assumption is that categories are ‘n correct rank order and that their 
re stable, except for sampling errors. It is also assumed that distribution of 
hological continuum. There is also an implicit 


boundary lines 4 | 
imulus is normal on the psyc 
n categories) are perfectly correlated with 


responses to a Sli ( | 
assumption that momentary judgements (placement | 


momentary responses (psychological attributes). 
N 


NEWER PSYCHOPHYSICAL METHODS 

Apart from these major psychophysical methods, 
been used in scaling process. Among them two methods, 
method of magnitude estimation deserve special attention. 


there are other such methods which have also 
namely, method of category scaling and 


_ 


scaling Techniques 337 


method of Category Sealing | | 

this is a psychophysical scaling method In which stimuli are grouped into a predetermined 
umber of categories on the basis of their perceived strength along certain dimension. In fact, this 
ae of the direct methods of scaling whose importance was first shown by Sanford. He asked 
a subjects to judge a number of envelopes, each of which contained different weights, and sort 
them Into five categories. Category 1 Was meant for the lightest weight and Category 5 for the 
neaviest, and the remaining categories were distributed in such a way that intervals between the 
category boundaries would be subjectively equal. As da consequence, the difference in sensation 
between the upper and the lower boundaries of Category 1 would be the same as those of 
category 2. Thus all categories would be of the same size. That is why this method is also known 
3s equal-interval scaling. Since Sanford, category scaling has been studied in greater detail, 
particularly by Stevens & Galanter (1957), who frequently used a scale of 7 categories rather than 
5 categories. 

After the subjects have finished the sortings, the mean ratings for each stimulus across 
subjects are calculated. Such mean ratings definitely reflect an interval scale. Thus absolute level 
of rated intensity of each stimulus is interpretable, as are the relative distances between the mean 
ratings for different stimuli. This is one of the major advantages of category scaling. However, 
there are some disadvantages, too. Here responses are linked to a few labels or categories only. 
Not only this, in this scaling method, the stimuli that are similar but stil| discriminably different 
from one another may be grouped into the same category. 


Method of Magnitude Estimation 
This method was developed by Stevens (1956). It was originally introduced as a psychophysical 
method to be applied to the physical stimuli but it has also been successiully applied for scaling 
psychological stimulus dimensions. This represents the most direct approach to scaling. In this 
method, the subject is presented with a standard stimulus that serves as the starting point for his 
judgements. The experimenter also defines the subjective numerical value of the standard 
stimulus called as modulus. The subject's task is to assign each presented stimulus a number that 
reflects the apparent magnitude of that stimulus relative to standard stimulus. For example, ina 
'ypical Magnitude estimation experiment the experimenter may wish to scale the loudness of 
different tones, For this, he would present first, a tone of certain loudness to the subject who 
Would be instructed to call the loudness of this tone as 100 (modulus). After that, the subject 
ee to assign all succeeding tones a number relative to that standard. If the next 
pags 5d Bed lwice as loud, he would assign it 200 anc it it Was judged half as loud, it would 
iy tain > haipib en this is. avery direct way of measuring sensation. It is also clear that the 
the aaa bial that is, jude ing stimulus to be ntimes above arbelow ae rs, x. ft i 
Shs — scale Will be a ratio scale. Magnitude estimation method was the base o I" 
R er Law. This demonstrates the heuristic value of the method. 
Su hee variation of magnitude estimation method has been — — = 
ited to a required fo assign numbers to each stimulus. In this modified form, t rhe jectis 
This nes, = : “be stimuli on the dimension by producing responses on the vacdin senna | 
Sample eh ‘Ss Named as cross-modality matching. This may be ox rin ma - | 
Oude th ee May be asked to judge the length of line by varying the loudness of a tone. The 
© the longer the line they judged to be. | 





tay 


e 


HOL( 
~SGICAL SCALING METHODS 


: | | : 10d 
“dc : ‘ three ‘Mportant scaling methods, namely, the method of rank ‘ie the meth 
Parison and the method of successive categories will be discussed. 


Drm 





7 - } § j | j ‘t i Scie ices 
; r ft ry Pe Fi d eth 0 a 


ots Measurements 


338 Te 
¢ ‘Order of merit method’, is one of the ¢ 

persons, events and other items, The % Mpls 
| to the extensive works of Cattel| (199 Od 6 
k in the computation of Correlation | ang 
who is requested to rank th his 


Method of Rank pagal! r also known a 
, method of ran am nt ects, 
ar ues for psychological scaling of _ ae 
past are was developed mainly 8 oe se yse-ran 
an (1904) who demonstrated ho he subject 
geben ee presented simultaneously to the subje chee : 
are the rank of 1 indicates the best and the higho, . n 


method, all stimu! wantionall 
h to low. Conventionally, s being ranked. The method of rank org nk 
@ 


order from hig j ae f the stimulu 

ie : ition of the stl ; 

rs indicate the inferior position O" Sect | d is small. F els 
number the number of stimuli or objects to be ranke or the PUTPose 


uited when Pe ae ae sf 
a nig stimuli are presented either physically or —— therefore, NO problem 
ran : ee the sequence of presentation of objects arises. This ts = O : at te Of the 
on ev method over the paired comparison method where: all stimul are Usually ho, 
‘ 


simultaneous observation. | 
presented for | darie k an ‘ordin: .) 
si ) er is known as an ‘ordinal sca | 
The scale resulting from the method of rank or ale’, When 


ranking has been done, the obtained data may be pe a — _ computationa 
procedures as described by Guilford (1954) or they may ee : | ‘Nema pier treatments, 
which are common in many research works. Here, we shall imit our discussion to the simpler 
treatments of rank data. Suppose 10 actors were ranked by 10 judges who have to rank them jn 
terms of how well they liked their acting. Rank 1 indicated the most liked actor and rank 19 
indicated the least liked actor. Applying suitable statistical techniques, we can Measure the 
amount of agreement or disagreement among the judges. The data (hypothetical) are Biven jn 
Table 13.8. From these rankings, a scale can be derived by converting the ranks into choice 
scores and then to p values and finally to z values. If a judge gives rank 1 to a certain actor, it 
means that he prefers (or chooses) him to other actors. Likewise, rank 2 indicates that this actor is 
preferred to the remaining actors, and so on. Thus ranks given by the judge indicate his choice 
score, C. In general, Ccan be estimated by the following formula: 


C=n-r 


(13.9) 


where, C = choice score; n=number of objects ranked; and r=rank assigned. 

Suppose an object is assigned rank 3 (the total number of objects being 10 ), its choice score 
= 10 —3 =7. Since Equation 13.9 holds true for all the ranks assigned by the judges to the same 
object, mean rank, M, and mean choice score, M. for that object can easily be determined by the 


equation given below: 


hate (13.10) 
| N 
where, M, =Mean rank of an object; 
2r =Sum of ranks assigned by all judges to an object; 
N = Number of ranks (or judges ). 
M. =n=Mr (13.11) 
where, M. = Mean choice score: 
n= Number of objects ranked: 
M, =Mean rank. 
Subsequently, the value of M. is converted into p value as under: 
M- (13.12) 





mene 


where the sym} , 
table ealeans i ono usual. Each p value is converted into z value with the help of @ 
ocimosverg 1954, 206 ). There are certain checks on the computations 


_ 








10d’, is one of the « 

1 other items. The met ees 
works of Cattel| (1993) of 
station at Correlation | ; 
is requested to rank the his 
1c best and the higher it 
The method ot rank Sel 
is small. For the Lilitioen z 
and theretore, No Bieta 
tthe main advantages of 
all stimuli are Usually ~s 


is an ‘ordinal scale’. Whe 
er elaborate computational 
to some simpler treatments 
ir discussion to the simpler 
; who have to rank them in 
st liked actor and rank 10 
jues, We Can Measure the 
(hypothetical) are given in 
ting the ranks into choice 
ank 1 to a certain actor, it 
? indicates that this actor is 
. judge indicate his choice 


(13.9) 


signed. 
reing 10 ), its choice score 


by the judges to the same 
asily be determined by the 


(13.10) 


(F310 


(13.12) 


| a 
value with the help o . 
‘ks on the computalior 


_ ~~ 


i ae 


scaline Techniques 339 


verage M, must be equal to(n + 1)/2. In Table 13.8, the average value of M, ts 
al to (104 1)/2 = 5.5. Likewise, the average M, must be equal to(n—1)/2. 


pown above: = : 
sh ich Heong ee hich becomes 
5.94 wage value of Ad. is 4.9 whic h becomes equal to (10 —1)/2 = 4.5. Similarly, the average p 
The a sau Ito 1/2 or 0-5 which is a case in Table 13.8. 
1 
pe . 
Table 13.8 Ranks reduced to choice scores and z scores 






Actors _ 






per A B C i... £ Fo. Gi | J 
6 8 
5 2 4 3 5 7 6 8 9 10 
3 ] 3 i 2 6 7 & 9 5 10 
4 1 2 3 4 6 7 5 8 9 10 
5 1 2 3 4 5 6 7 8 10 g 
6 1 2 4 3 5 6 8 7 9 10 
7 1 3 4 2 5 6 8 7 9 10 
3 I 4 2 3 5 6 8 7 10 9 
9 2 3 I 4 5 6 7 8 9 10 
10 2 a 4 6 5 7 8 9 10 
same 12 of 20 32 52 2 f2 77-—Ss«€BD 97 
52 G2 72 77 89 G7 


Me 12 2% BH 3.2 


Men-M,= 88 73 70 68 48 38 


p=M{n-1)= 0.96 0.81 0.78 0.76 O53 O42 0.31 0.26 


7=+2.05 +0.88 +0.77 +0.71 +0.08 -0.20 -0.50 0.64 -1.18 —1.88 


2.8 ps rst 0.3 
0.12 0.03 


Checks; The average M, =(n+ 1/2; the average M, =(n—1 y2; the average p = 1/2 or 0.5. 


The method of rank order has several advantages over its closely related rival called the 
rating scale. First, it forces the judges to use all parts of the scale. In a rating scale a judge may 
easily use only some parts of the scale, putting all objects into those categories. But this is not true 
inthe case of the rank order method. Second, since some objects are to be assigned higher ranks 
and some objects are to be assigned lower ranks, judges are lett with no way to commit the error 
‘central tendency, which is common in rating scales. Third, the ranking method requires the 
ss differentiate every item from every other item. Thus, the method forces the judges to 
ae d decision regarding every item by making a discrimination, which may easily be evaded in 

NB method. 
tionang e method has one serious limitation. Mt does not provide the quantitative 
Cnc, eae the element of the characteristic which segregates the ranks. For example: 
lothe > seein difference in amount of the characteristic segregating ranks 1 and 2 Is equal 
t hapnens neg characteristic segregating ranks 2 and 3 or 3 and 4? The answer is probably ‘No’. 
$ because the rank-order method does not account for the differences among the 


Maracterjcti 
‘€fistics beaj 
lies being ranked. 














. ae 


340 = Tests, Measurements and Research Methods tn Behavioural Sciences 
to refine the rank-order method, Guilford (1954) has suggested that the ;, 
bsequently, to another convenient transformation rs, f< 
+ a transformation is that the trait or attribute ae - 

lly distributed in the samples of items presenteq 
ation of ranks with a mean of 5 my 


In.an attempt 
can be transformed into centiles and su 
C scales values. One of assumptions in suc 


through the different items !s norma , 
le is based upon the normalized transform 
ks are transformed into C values, they can easily be treated as interval s¢,), 


ysis suitable to the interval measurement can be carried out, 


judged 
ranking. The C sca 
SD of 2. When ran 
values and all statistical anal 


Method of Successive Categories 
e method of successive intervals, i. : 


The method of successive categories, also known as th 
on published by Saffir (1937 


developed by Thurstone and later u 
values derived by three different methods, namely 


method of pair comparisons, method of rank order and method of successive Categories. Like the 
method of equal-appearing intervals, in the method of successive categories each subject js 
required to make a single judgement for each statement as belonging to one of a limited numbe, 
of categories, which tend to differ quantitatively along a continuum. Therefore, the method is 3 
convenient one, especially when the number of statements to be scaled is large. The subjects may 


be asked to sort each statement in any one of the nine categories mentioned below: 
3 4 5 6 / 8 9 


1 2 
ee ee ee ee ee ee ee ee 
Neutral Favourable 


Unfavourable 
Varying degrees of unfavourableness are represented by letters A to D and varying degrees 

of favourableness are represented by letters F to |. E is the neutral category. No assumption of 
psychological equality of these categories is made and in this sense, the method differs from that 

of the equal-appearing intervals where it is so assumed. After the various statements have been 
sorted or rated into the various successive intervals or categories by a group of judges, the data 
are obtained in the same form as those obtained in case of the equal-appearing intervals. Suppose 

a group of 400 judges rated or sorted the 35 statements into nine successive intervals according to 
the degree of favourableness—unfavourableness that each statement is assumed to express. Of 
these, the obtained data for four items (for illustrative purposes) have been presented in Table 
13.9. The data also show the frequencies with which each statement was sorted into a particular 
category. These frequencies were cumulated from left to right to yield cumulative frequencies 
(C; ) which were transformed into cumulative proportions (C,,) by multiplying each C;, by the 
reciprocal of N. On the basis of the distributions of the cumulative proportions of the set of items, 
the widths of the intervals making up the psychological continuum are determined. Here we 
assume that the distribution of the cumulative proportions for each item is normal. Subsequently 
the scale values of each item are determined by calculating the mean or median of the 
cumulative proportions. For the purpose of showing the elaborate calculations involved in 
estimating the scale values, the cumulative proportions are segregated from Table 13.9 and have 


technique of scaling originally 
who made a comparative study of the scale 


been presented in Table 13.10. 
The proportions for each item given se | Ses: 3 
| Nn separately for each successive interval mav be 
— pj. In other words, each cell entry of Table 13.10 may be designated as p ori | 
i ~~ , e — and ; indicates the upper limit or boundary of any particula r category or 
below the upper imi pF be py is ihe proportion of judges who place the statement 
| atego = 9, Gneieate a 
the statement / above the upper |i 6 ty BB Pi indicates the proportion of judges who place 
| pper limit of the jth category. For instance, p,,. is 0. 7 
or tem number 3 fall below the upper limit of the 6th interval and 


Scaling Techniques 341 


9.400 = 0600 of the judgements for item number 3 fall above the upper limit of the 6th 
| mii The next step ts lo ¢ onvert the p,, into the corresponding normal deviates. If p, is 0.50, its 


nite ate will be — ) 
, ding normal deviate will be 0,0 and the scale value of the statement will fall exactly on 


col respon 
Successive intervals data showing frequencies, cumulative frequencies and 
cumulative proportions for items sorted by 400 judges 


Successive intervals 


table 13.9 








ciatement Unfavourable ars 
| 2 3 4 3 6 ‘a 8 g 
nr 
f 2 10 40) 100 150 46 30 15 8 
| C; 2 12 52 152 302 347 377 392 400 


CS 0.005 0.030 0.130 0.380 0.755 0.868 0.942 0.980 1.00 


Nee EEE EEnEEEEEEEEEEIEEEnEEEETESEEEEEEEEE 


f 0 0 100 80 60 280 50 20 10 
) Cc, 0 0 100 180 240 320 370 390 400 
Cc, 0 0 0.250 0.450 0.600 0.800 0.925 0.975 1.00 
f 0 0 0 20 #40 100 200 30 ~°=# 10 
3 G 0 0 20 60 160 360 390 400 
C 0 0 0 0.050 0.150 0.400 0.900 0.975 1.00 


a 
fF 862 15 10 3 40 170 100 = 31 10 


4 ec 21 36 46 #449 &«89 259 359 390 400 


C, 0.053 0.090 0.115 0.122 0.222 0.648 0.898 0.975 1.00 
Table 13.10 Cumulative proportions ( Pip) of 4 items sorted by 400 judges by the Method of 
successive intervals 


Statements Successive intervals 
4 1 2 3 4 5 6 ‘oa 8 9 
0.005 0.030 0.130 0.380 0.755 0.868 0.942 0.980 1.00 
é 000 000 0.250 0.450 0.600 0.800 0.925 0.975 1.00 


3 000 000 »«=—«:000St—«a.050 «0.150 0.400 0.900 0.975 1.00 


‘ 0.053 0.090 0.115 0.122 0.222 0.648 0.898 0.975 1.00 


4 


* Table of normal deviates can be seen at p. 246 in Techniques 





Fests. Measurements and Research Methods in Behavioural Sciences 
rval (or on the lower limit of the jth interval plus one ). IT py is Br 
than 0.50, the corresponding normal deviates will be per and vate oe the upper fire 
the jth interval will deviate positively from the scale sone 8 e a t : i other hang 
p, is less than 0.50, the corresponding normal deviate sia of th Sent the Upp 
limit of the jth interval will deviate negatively from the scale va on or the statement i Entering i, 
Table of Normal Deviates* with the p, of Table 13.10, normal deviates for the SUCCESSIVE Interval. 


d presented in Table 13.11. In converting the Pi; into th 
than 0.98 is ignore, 


342 


the upper limit of the /th inte 


for each statement were found an | | 
corresponding normal deviates, any value of p, less than 0.02 and greater ) 
because these two proportions are near 0 and 1 respectively, whose normal deviates are infinite 
Hence, the upper category, that is, 9, and the lower category, that is, 1 are generally NOt scalable. 
The cell entry or normal deviate of Table 1 3.11 may be designated as Z, In which i stands for 
a particular statement and /' stands for the upper limit of the jth interval. Z37 means that the y 

limit of interval 7 has been expressed as a normal deviate in terms of item 3. The next Step is tp 


Table 13.11 Normal deviates (z,)) corresponding to the upper limits or boundaries of the 


successive intervals 


 ———————————oooooEOEOEeE—Eo=omE—E——=E=o—=Eo—Eo—EomEo—mEeEo—Eo—Eo—=o—=o—=—=EEEE—— - | | : , 7 = = 
Statements Successive intervals 
2 3 4 ae tek 7 8 ; 








I - —-1.881 -1.126 -0.305 +0.706 +1.126 +1.555 42.054 —_ 

2 - - -0.674 -0.126 +0,.253 +0.842 +1.405 +2.054 . 
3 i Zz - —1.645 -1.036 -0.253 41.282 42.054  - 
2 -1,645 -1.341 -1.175 -1.175 +0.772 +0.385 +1.282 +2.054 ~ 


In converting p; into z;,, the values of p;, were rounded to two decimal places before 
§ Pi ij Pij p 


conversion. 
compute the width of the successive intervals and for this purpose, some kind of averaging is 


done. If the z matrix (Table 13.11) is complete, that is, if no cell values are indeterminate, which 
can happen only when no p; values are greater than 0.98 and less than 0.02, the width of the 
successive intervals will be the differences in means of the successive intervals or columns in the 

z matrix. On the other hand, if the z matrix is incomplete (as it is seen in Table 13.11), the width of 
the successive intervals is estimated by making successive subtraction in pairs in each row of 
Table 13.11. The resulting width of the interval is designated as w;,, which simply indicates the 
width of the jth interval provided by statement i. Thus the width of interval 3 provided by item 1 
(w,;)is equal to —1.126 -(-1881) = 0.755. Other estimates of the width of interval 3 have similarly 
been obtained. The procedure is repeated for obtaining the width of each interval. Table 13.12 
presents w;, of the 7 categories. The width of the two extreme categories like category 1 and 
category 9 cannot be estimated by the procedure being described because the lower limit of 
category | and the upper limit of category 9 cannot be determined. All entries in the given 
category of Table 13.12 are the estimates of the width of the same interval. Subsequently, the 
arithmetic means of the entries of each column or category are calculated as the estimates of the 
width of various successive intervals. The arithmetic means may be designated as w. In 
Table 13.12, the first row at the bottom indicates the sum of category entries or column 
entries. The second row indicates the number of eniries in each column; the third row indicates the 


of Attitude Scale Construction by A L Edwards (published in 


India by Vakils, Feffer & Simons, Bombay 1969); American publisher is Appleton Century Crofts, USA. 


| Be 
h 4 
: | 
Scaling Techniques 343 . 
Estimates of category widths (w;)) on the basis of the differences between the 


upper limits of intervals | 
, 
5 3 


Table 1 3.12 















uccessive intervals _ 








paren ss ae = 5-4 6—5 7-6 8-7 | 
, aves 6621 1011 04% O79 0.499 | | 
0.548 0.379 0.589 0.593 0.649 i 
0.609 0.783 1.535 0.772 


0.304 0.166 0.000 0.403 1.157. 0.897 0.772 


e categories; and entries in the fourth row indicate cumulating means, which 
forms the basis for our psychological continuum. Now, when the common psychological 
continuum has been established, it is easier for the investigator to scale each statement upon the 
continuum. For finding out the scale value of each statement, the median may be calculated by 


the following formula: | 


mean OF W of th 


S. =/+ 


i 


050-26 (13.13) 


S, = scale value of statement /; 


where, 
rval on the psychological continuum upon which the median 


} = lower limit of the inte 
falls; 

Zp, = the sum of the prop 

p,, = proportion within th 

the difference between cumulative proportion 

and the cumulative proportion of the category f 
category, which contains the median; 

w= mean of the interval in which the median falls. 
four statements may be calculated 


ortions below the interval in which the median falls; 

e interval in which the median falls and, therefore, it is equal to 
of the category where median falls 
alling immediately below the 


Now, the scale values of the above as shown below. 














Statements | 
1.170 + (0.50 — 0.380)/0.375 « 0600 = 1.197 
2 1.170 + (0.50 —0.450)/0.150 ¢ 0600 =, 1.174 
3 2.507 + (0.50 —0.400)/0.500 x 0.886 = 2.861 
4 1.770 + - (0.50 — 0.222)/0.426 2 Oar = 2200 






The steps involved in calculation of the scale value of statement (or item) No. | may be 


enumerated as shown below. 


1. The first step is to determine th 
Cumulative counting from left to right for item 1 in Tabl 


interval or category. 


at interval in which N/ 2 (which is here, 200) falls. A 
e 13.9 shows that N/ 2 falls in the 5th 





y 


344 Tests. Measnrome of 
a Se fC GSN NOMIC ES (anne AY scared Methoels ti} Behavioural Ne MOnces 


) | 1c : ‘harm ie . ree | . 5 é 
| - The second step is to determine =p, which is the cumulative Proportion of th 

‘ ‘ , . _ 7 : i 
mmediately below the 5th interval, From Table 13. 10, itis read as 0.380 

3. The third step is to determine p, which is the difference between the 
Proportion of the interval which contains N/2 anc Or un 
| 7 . ‘ontains N/2 and the interval immediately + . 
Table 13.10, é ediately below it 


Mery, 


Alive 
i ony 


Pw =0755 -0.380 =0375 


4. The fourth step is to determine the width (or mean) of the interval, which contai 
from Table 13.12, it is read to be 0.600, RE, 
_ 5. The lower limit of interval (/) on the psychological continuum upon which median {. 
iS 1.170 (from cum w), eet OHS 
6. Finally, all Values are substituted in Equation 13.13 and the resulting median becom 
the scale value of each statement. , ~iMes 
In this way, each statement is scaled by the method of successive categories. Subsequently 
the scale values of each statement may be tested for internal consistency if desired by the 


investigator. 


Method of Pair Comparisons 

The method of pair (or paired) comparisons is one of the very common methods of scaling. In this 
method, stimuli are paired and the subject is required to make a comparative judgement by 
saying which member of each pair is preferred or/possesses more of the trait being scaled. Hence, 
the method is named as the method of pair comparisons. Before the procedure of the method js 
considered in detail, it is necessary to examine the law of comparative judgement, which is the 


base of the method of pair comparisons. 


Law of Comparative Judgement 
Thurstone (1927) formulated a law of comparative judgement, which explained data obtained on 
the basis of comparative judgements of two stimuli. Each judgement in comparative judgement 
does not yield a quantitative value on a psychological continuum. But the law of comparative 
judgement made this possible through a series of computational processes. The law may be 
defined as a series of assumptions, which relate the proportion of times one stimulus is judged 
higher on a given attribute than the other stimulus (of the pair) to the scale values and discriminal 
dispersions of these two stimuli which are repeatedly being compared. The set of assumptions is 
derived from the following four postulates: 

1. The stimulus presented to the subject produces a response, which has some value on the 
psychological continuum. In Thurstone’s terminology, this response is known as the ‘discriminal 
process’, which may be designated as S;. Thurstone (1927) defined the discriminal process as 
“that process by which the organism identifies, distinguishes, or reacts to stimuli.” Thus a 
discriminal process is a theoretical concept and indicates the subject’s reaction or response when 
asked to make a judgement regarding any one of two stimuli on a given attribute. 

2. A given stimulus, if presented repeatedly, does not produce the same discriminal 
process. Sometimes, the value of the discriminal process may increase and sometimes, it may 
decrease. The reasons why the changes in the value of the discriminal process of the same 
stimulus occur are the momentary fluctuations in the subject. 

3. If any stimulus is presented to the subject a larger number of times, it will generate 4 
frequency distribution, which is a normal one. Thus, each stimulus generates a normal 
distribution of discriminal processes. The distribution is illustrated in Figure 13.8. 

4. The mean and standard deviation of the normal distribution of the discriminal process 
generated by a stimulus corresponds to its scale value and the discriminal dispersion respectively: 


Scaling Techniques 345 


given stimulus is known as the 
Figure 13.8). Since in normal 
the modal discriminal process, 
d by a given stimulus, is taken as the 
or the modal discriminal process) and the 
not necessarily be the same. This point has 


. giscriminal process which is most frequently aroused by a 
ye discriminal process’ and may be designated as S, (See 
“ fon the mean, median and mode have the same value, 
dist" - © mean ot the set of discriminal processes arouse 
peing 1 ue of that stimulus. The scale values ( 
processes for the different stimuli will 
ted in Figure 13.9. 


nn 


ef ale val | 
_aminad 
qiscrimin 


peen illustra 


the distribution of discriminal processes obtained from four stimuli has been shown in 
Figure 13.9. The scale value of each stimulus is its modal discriminal process. Thus, the scale 
' ; ec : eo a Ea 
sive of stimulus 1 Is S,, the scale value of stimulus 2 isS,,andsoon. 
Ve - 





Fig. 13.8 Normal distribution of 
discriminal process 
generated by stimulus 


Fig. 13.9 Distribution of discriminal process generated 
by the four stimuli on the same psychological 
continuum 


Since the subject cannot directly tell the value of the discriminal process on the given 
psychological continuum (that is, heavier, brighter, louder), no frequency distribution of 
discriminal processes can be directly obtained. Hence, the scaling of stimulus is done indirectly 
through a set of equations which relates the judgements made by the subjects to the scale values 
and discriminal dispersions on the psychological continuum. The law of comparative judgement 
does this job well. Let j and j stand for two statements (or stimuli) ona psychological continuum. 
Suppose that a group of subjects has been asked to judge as to which of these two statements is 
more favourable. On each pair presentation to each individual, / and j will elicit a separate 
discriminal process. Let the discriminal process aroused by i be designated as S, and the 
discriminal process aroused by / be designated as S;. The difference between S; and S, is known 
‘sthe ‘discriminal difference’. If these two statements are presented to each member of the group, 
Ne discriminal differences themselves are likely to form a normal distribution on the 
Psychological continuum. As we know from the elementary principles of statistics, the difference 


‘ween two normally distributed variables is also normally distributed with a standard 
deviation. Thus 


yy! : : 4 ' : a eee = 8 
here, %-, = Standard deviation of the differences between the discriminal processes of j and 


the discriminal processes of j, that is, SD of the discriminal differences (S, — S,); 


° = Standard deviation of the discriminal processes generated by i, that is, SD of S;; 


°, = Standard deviation of the discriminal processes generated by /, that is, SD of S,; 
', = Correlation between S; and S,. 
Thurs ative jud His Case | is 
| lone has ‘os of the law of comparative judgement. His Case | is a 
“Omplete ; has presented five cases of the law P 


rm of the law based upon Equation 13.14 and the normal deviates, z,, Corresponding 

















346) Jests, Meastirvements and Research Methods tn Behavioural Sciences 

to the proportions of comparative judgements, (that is, Ff, ). Thus PF, is the Proportion of 
judged greater or higher than j. When we wish to express the Proportion of times. ; times j. 
greater than i, the subscript becomes P;,. When the comparative judgements of iand - jg 
obtained by a group of judges, we have frequencies, which show how many subjects been 
greater than /. Accordingly, f, may be written to indicate those frequencies. When f, js e i ef 
by the reciprocal of N, we get the value of P; which, in turn, can be expressed as a unit of UIDlieg 
deviates z, with the help of the Table of normal deviates. The equation for the Case l of hema 
comparative judgement in its complete form is based upon Equation 13.14 and may be mis 
under: 


= Dig — 
5; -5; = Zip 5; + G; 25,9; 9; 


W af 
en as 


(13.15) 


where, S, and S$, indicate the modal discriminal process or scale values of statements j ang | 
respectively. The remaining symbols are defined as usual. Equation 13.15 applies to the coed | 
judgements of the single subject. Thurstone’s Case II of the law of comparative judgement « 
essentially parallel to his Case | and requires the same kind of complete data as in Eu 
13.15, His Case Il is meant for a situation where several subjects make a single judgement in eek 
possible pair and Equation 13.15 applies to this situation, too. The complete solution of Equation 
13.15 is not possible because we have more unknowns than knowns. Hence, three 
approximations to Equation 13.15, which simplify the assumptions associated with it have been 
proposed. These are known as Case Ill, IV, and V of the law of comparative judgement. 
Suppose we have five statements expressing attitudes towards birth control. Now these 
statements are presented in all possible pairs to a group of 200 subjects. The maximum number of 





Pairs from five statements will be given by ~~ — a = — = 10. For five statements 


we will have the 10 equations given by Eq. 13.15 and these 10 equations will have 20 unknowns, 
that is, 5 scale values, 5 standard deviations and 10 intercorrelations and 5 knowns, that is, the 
values of z, calculated on the basis of py with the help of the Table 13.11 of the normal deviates. 
In order to solve the problems faced by the unknown intercorrelations, Thurstone assumed that 
the intercorrelations are zero, that is, NO correlation exists between the responses to any pair of 
statements. This assumption is more defensible where the stimuli are not clearly identifiable. Asa 
consequence, the law reduces to the following form (because the last part is dropped): 


Equation 13.16 illustrates Case III of the law of comparative judgement. Still, in the above 
examples we will be having more unknowns (5 scale values and 5 standard deviations) than 
knowns. To solve this problem faced by the discriminal dispersions or standard deviations, 
Thurstone assumed that the discriminal dispersions are nearly equal. Then, the law reduces to 


5; -S; =0707 z,(6, +.6;) (13.17) 


Equation 13.17 illustrates the Case IV of the law of comparative judgement. Discriminal 
dispersions will be equal when stimuli are equally easy to place on a scale. Now Thurstone made 
one additional assumption to simplify his law further. He assumed that all discriminal dispersions 
are equal, that is, o, = G; and then the law reduces to 


in which 6; stands for both discriminal dispersions. Taking o,/2 to be the eoemnan Un" 
measurement of scale separations of the various pairs of statements or stimuli and letting ! 


common unit of measurement be equal to unity or 1, the law finally reduces to 


er: 19) 
Zi 


(13.18) 


/ 


>= es cme. | -_— = — ie 


— 
aie 


» 


Scaling Techniques 347 


It is theoretically an equal 


ustrates Case V of the law of comparative judgement. 
The steps involved in the 


we are left with 5 unknowns, that is, 5 scale values. 
es will be discussed in the section below. 


7 E ation 13-19 il 
al scale. Still, 


~aiculation of scale valu 
i 


rim antal procedure of the method of paired comparisons 
Expe 


law of comparative judgement discussed in the previous section, requires that each stimulus 
The ompared with every other stimulus a large number of times so that the proportions of 


Id be ; ) ti 
oa ctimulus / has been judged greater or higher than stimulus / is clearly known. The method 
ae aving some empirical estimates from these proportions is technically known as the method 
for nav" 


of paired comparison. 
Originally, the metho 
colour preferences of subjects. 


rfact, the method of paired compa 
“les difference having two-category responses. 
wifference each time the two stimuli are presented to the subject in which there is one standard 


timulus and one comparable stimulus. Thus, each time each stimulus is compared with a single 
gandard. In the method of paired comparisons each stimulus is compared with every other 
stimulus and therefore, each stimulus serves as a standard stimulus in turn. When the subject ts 
wresented with the two stimuli in pairs, his job is to tell which stimulus of the pair is greater 


heavier, more favourable, louder) on the psychological continuum (or the attribute being 
‘s allowed and therefore, the subject must identify one of the 


caled). No equality judgement 

members as being greater than the other. Even if the subject finds some difficulty in making a 

comparative judgement, he is encouraged to give his judgement in favour of any one member of 
arison, there is no provision tor analysis of the 


the pair because in the method of pair comp 
responses showing failure to select one member of the pair. Ordinarily, a stimulus is not 


compared with itself and it is assumed that if such judgements are obtained, their proportion will 
be 0.5 or their frequency will be half of the total number of subjects/judgements. It should be, 
however, noted that there ts nothing in the method itself which restricts such judgement. 

In the method of paired comparison, some biasing effects in the form of space error, time 
error, fatigue effect, practice effect, may Occur and it is, therefore, essential that such effects be 
adequately controlled. In his law of comparative judgement, Thurstone has made no provision 


for controlling these effects and hence, + deserves more attention. Perhaps, the most common 
mber of the pair right (or above, etc.) for 


method of controlling space error Is to present one me 
half of the subjects or trials and the other member left for the other half of the subjects or trials. 
Similarly, time error can be controlled by presenting one member of the pair first for half the time 
and by presenting the same member of the par second for the remaining half time. Likewise, by 


reversing the order of the presentations of pairs for half of the subjects or trials, the practice effect 
or fatigue effect may be controlled. Attempt should also be made to keep pairs having one 
member common maximally separated In the order of presentation. 

Ordinarily, there are two criteria for using the method of paired comparison. First, the 
number of stimuli to be scaled should be small. If the number is large, the method should 
preferably not be used because it would tax the motivation and interests of the subject. Second, 


where the purpose is to obtain an interval scaling of the psychological dimension, the method is 
d by the method of paired 


preferred. In order to obtain data suitable for being analyze 
e made for each pair of stimuli. 


ar eager | ) | 
"i omc it is essential that a large number of comparisons b 
Or this purpose, the situation must be in accord with any of following three conditions: 


!. Asingle subject must have judged each palr a large number of t 
¢. Many subjects must have judged each pair only once. 
3. Many subjects must have judged each pair several times. 


introduced by Cohn in 1894 in his study of 


thod of paired comparison was 
d by Thurstone. As a matter 


Subsequently, it was further develope 
rison is nothing but an extension of the method of constant 
In the method of constant stimulus 


imes. 


, 





Fite bent 
VON De inte 


b Methcxds in Bebariou ral Sciences 


343 Tests Measurements and Keseare 
4 details of the empirical estimates of the scale Values to be Made 
othe In the previous examples, 5 statements relating to atin 
“ei all possible pairs by 200 os that 2 Subject ma 

> =10 comparative judgements. The raw data showing the a r Of times One 
¥§-1)]/2 =10 cmp sed more favourable than the other statement have been arran ed; 
siiesaild of the pair was nee cmon which gives the frequency with which each =, 
the Nx N matrix, pomagnnins wie — taveurable than the row statement or stimulys (Se 
stafemeni OF SUMUIUS Wes JUS 


Table 1 3.13). 


‘iow, we-can retum f : 
‘ne method of paired comparsO 


cowards birth control were judged 


Table 13.13 The F matrix for 5 statements judged by 200 subjects 


DSS ae ee 
170 160 170 


' 100 150 

. 50 100 120 150 180 
3 30 80 100 120 180 
4 +0 50 80 100 110 
; 30 20 20 90 100 


————— SSS. nn 


in Table 13.13, diagonal entries are assumed to be equal to N/2 because no comparison of 3 
statement with itself is made. In this example, each subject made one comparative judgement for 
each pair only once. Hence, for each pair there were 200 comparative judgements, which is 
equal to the number of subjects. On the basis of the F matrix of Table 13.13, the P matrix shown in 
Tale 13.14 is prepared. Each entry in Table 13.13 has been multiplied by the reciprocal of N, 


The N is 200 and its reciprocal is 0.005. 


Table 13.14 Pmatrix prepared on the basis of F matrix, that is, Table 13.16 


Statements I 2 3 4 5 
en ne ee ee ee | 


I 0.500 0.750 0.850 0.800 0.850 

2 0.250 0.500 0.600 0.750 0.900 

3 0.150 0.400 0.500 0.600 0.900 

4 0.200 0.250 0.400 0.500 0.550 
Sums: 0.750 1.500 1.950 2.600 3,200 










= 


: —“e ris snown in the P matrix indicate the Proportion of times the column stimulus is judged 
avOurable to the row sti - , ; 
are ~ es the row stimulus and may be abbreviated as p;;. Subsequently, each column in the P 
SLA hes ; i - * . ; ; « |: 
aie " summed ignoring the diagonal entries, It is desirable to rearrange the stimuli oF 
“atements in te in f -_ 
ae terms Of the increasing order of rank of column sums of the P matrix so that the 
Nel€st Column sum must be at the lef | | . 
matrix. While makine th € lett and the largest column sum must be at the right of the p 

cleaat s » arrangement, the corresponding pairs in row and column should 
cna : ow “y : : 

on™ Or example, if one interchanges the column entries for statement 3 and 


7 oe - 
“sleMent ? 


“, ME Corresponding row antrinc f } 
cm NG FOW entries for statements 3 and 2 should also be interchanged. 
uo Featrangement of rows and column entries is needed because the 
in terms of increasing order of ranks. Now, with 


SUM oy Fach ral), 
CA Column has already been obtained 





Scaling Techniques 349 


. | the Table ot Normal Deviates We es a 2 ' 

he help oF ' stimate the unit normal deviate. 7. correc . 

1p entries of Table 13.14. Table 13.15 shows the value of 7 ind is k my “) Orresponding 

“tion of the Z_ matrix reveals that if p, is high yan’ Is known as the Z matrix. An 

inspect Py IS Digher than 0.500, it gets a positive sign and if p 
, 


< lowe! than 0.500, it receives a negative sign. 


Table 13.15 Z matrix prepared on the basis of the P matrix, that is, Table 13.14 








Statements 4 a oe . 4 5 
1 OOO 0.674 1.036 0.642 1.036 
2 ~0.674 000 0.253 0.674 1.282 
3 — 1.036 —0.253 O00 O:253 1.282 
4 — 0.842 -— 0.674 — 0.253 000 0.126 
3 —|.036 —1.282 —1.282 —0.126 000 
sums = —3.588 -1.535 — 0.246 1.643 3:/26 2=0 | 
N= 5 5 5 5 5 
Mean = -0.718 — 0.307 — 0.049 0.329 0.745 £=0 
Mean + 0.718 = | O00 0.411 0.669 1.047 1.463.2 =3.:59 





Ap, of 0.500 is always equal to z,, of 000. It becomes also obvious from the Z matrix that all the 
values of Zi above the diagonal entries of zero (that is, the upper right portion of the matrix ! are 
Numerically identical with the values of z;, below the diagonal entries, that is, the lower-left 
portion of the matrix except for the signs. Now if the entries in column 2 of Table 13.15 are 
subtracted from the entries in column 1, we would have, according to Equation 13.19, that is, 
Case V of Thurstone’s law of comparative judgement, the equations given below: 


24 - Zi = (Sy—5,)—5, — 5) = 52-5, = 0674 - (000) = 0674 
)—(5,- -§, =000-(-0.674) =0.674 

ty Biy = 6, -5,)- 5,94) = S37 = O53 = C98) 0782 
)—(5, —54) =52—-S, = 0674 —(-0.842) = 0.166 

Zo — 25 =(S, ~&)=(, =S§=§; 5, = 01282 ~(-0,1036)= 0246 


—= — — = 


$5) 


299 — 27 =(Sy—S2)—(5,- 92 


>» 
5S» 
D5 


Or, in general, 
ae = fe i ot We (13.20) 
Zyj — 24 =. -$))- 1 ae Bik 
. Imilar equations can be written for the ent 
a the ¢ . ; F . . : 
(oy ine Entries in column 4 minus the entries In ¢ 
ries in column 4, and so on. 
A ] : ™ ' th 
, When the values shown above are summed together 
F(iyaly = pa a So at nO 
: arnt difference between the mean OFT" COT ' | , mean of the differences, we can 
AUN Since ! want sans is equal lo The Mea se Shs =a 7 
: Since the difference between the means Peres ery rn finding 
ol the came vesilen ty cancion ies the ehltios: each column and dividing by N and ee, the 
igs > results by summ ea ae carmmiiiue Zale wales Ot 
ean, This has been i ne lor sch column. Means, thus obtained, become the sca . ) 2) 
1) He pactiial Cora bs bes S are the sum ofeach column, 
st : | | a ale | » SECC OW < 
is Sum of these sures ehenitel be zero whic h isa check LIP nv the a alc ul — ° a i ih 
o otto repre se t th sinh rof entries In each column. The third row represents . 
‘sents the numbe he 


es in column 3 minus the entries in column 2 
olumn 3, for entries in column 5 minus the 


and divided by N, we get 0.411, which 
lumn 2 and mean of the entries in 


>: 





I “pcr ts ana ecu r iy Methe wis mri Behavioural scleni es 
: F Mowasniry 
aso (eens 


of each column. These means bec ome the scale values for the eetidabics 2: sanernienits., t 
mean should be zero, which ts taken as a check upon the acc uracy of the Calculatio 
the scale values are negative and some are positive. Fore 
negative signs in calculation. One way to pet rid of § | ions: 
lowest statement or stimulus. For this purpose, a Constant positive value, equivalent ty the sca. 
value of the lowest stimulus, is added to all the scale values. Here, a constant of +0.718 wud 
to the means of all the columns, which make the scale value of the lowest statement zero and the 
scale values of all other statements positive. Since the mean of the scale 


values of the Statemeny, 
on the given psychological continuum is arbitrary, the addition of a cons : 


tant like above does nos 
change the distance between any of the scale values nor their location on the Psychologie. 


continuum, As a check on accuracy of calculation, the sum of means after adding the Constars 
should be equal to ntimes the constant added. Here, +0.718x5 = 3.590, which equals the surp Af 
means after adding the constant, that is, 0.718, 

One should note carefully that when 
vacant, the scale values cannot be estimated 
the cells of the Z matrix occur 
indeterminate when p;, is either 
introduce unreliability. j 


n. Some i 


onvenience one often wishes tac. 
‘ail: i BVO ig 
a negative sign is to assign zero vale 


i= 


the Z matrix is incomplete, that is, so 


by the procedure described above. 
when the corresponding p, 


SOME Cells are 
The vacancies in 
are at extreme. Z, 


judgements be ignored whose Pi |S greater than 0.98 o, less 
j 'S +2.054 and —2.05 


4 respectively. Guilford (1954 
that z,, that are more extreme than +2.0 and —2.0 be disregarded for which the 
corresponding P;, 80.977 and-—0.023 respectively. 

Having determined the scale values o 
COMparisons, we can next apply a test of internal consistency of 
consistency check Comprises Comparing the empirical data 
those to be obtained in terms of our derived scale values. The s 
empirical proportions P;, and the expected Proportions p, 
scale values. Ideally, if the difference jc zero, this indicates 
achieved. The first step in applying the test of internal 
normal deviates Z;, for scale separations or distances of th 


f the statement or stmulus by the method of Paired 
scale values. The interna) 
obtained in Proportions p,, with 
maller the difference between the 
. the higher is the consistency of the 
pertect Consistency. But this is rarely 
consistency is to obtain the theoretical 
€ statements. For this we set up a table 
Table 13.16 Theoretical normal deviates Z;, COrresp 


onding to the scale Separations of the 
statements shown in Table 13.18 


Statements Scale 1 2 3 4 5 
values 000 0.411 0.669 1.047 1.463 
1 OOO — 
2 0.411 -0.411 — 
3 0.669 — 0.669 —0:258 - 
4 1.047 —1.047 —-0 .636 =) 378 = 
5 1.463 -1.463 —1.052 - 0.794 ~0.416 ~ 







like Table 13.16 in which at the top and on the le 
shown. Subsequently, the scale values written on th 
scale values written at the top. The value thus obtai 
Table 13.16, when the scale value of 


are 


ft side, the scale values of the statements . 
e left side are subtracted columnwise at in 
ned becomes the value for z;,. Forexamp a 
stimulus 2 (the second row) is subtracted from the s¢ 


Sealing Leche 45) 


(the test column), we get O00-0.4 11 0.411, and so on. The theoretical 


oy suimulus | | 
e obtained for entries below the diagonal entries of fable 14.16. Maw by 


yale 


ory . 
ring | 


Nao ryist Doeviales We ‘Core Ww | - 


nthe lable ol 

st . ome 4 bh : ‘ , la fy ‘ ae, ‘ ree 
le, lor 4) ol Of H) we prot the Pi value of 0. SAL; for 7) Of 0.069, we get py of 0,252, 
eo on (el fable 14.1 7). blaving determined Py) lor each entry below the diagonal, we next 
the difference Deter the Sevres al py Of Table 14.14 and the corresponding 
per red thearelk al ‘a )| POVTLONS Py ol lable Pe aie bor this purpose, Vie eobtract each entry of 
fable: | »17 from the COLES] onding. entries ol lable 13.14 (that 1s, P, p},). Fable 13.21 shows 
qiscrepaneles between p, and py. Subsequently, cach column in Table 13.18 t5 summed 
the signs. The resulting. absolute values are divided by MIN ~1¥2 or the nurmber of 


1 In this way, we get the average error or discrepancies. 


onl 
[ol en wp 


and 
ele pine 


(hie! 
NOrN® 
(Hist ropwaye lt 

Table 13.17 Theoretical proportions p);, corresponding to the theoretical normal deviates 7 
of Table 13.19 










Statements 2 3 4 5 
_ 
2 0.341 - 
3 0.252 0.398 = 
4 1.148 0.262 0.353 - 
. 1.126 0.146 ().214 0.339 ~ 


Table 13.18 Discrepancies between theoretical proportions pj, and empirical proportions p , 





: 
2 ~0.091 - 
3 —0.102 0.002 = 
‘ ~0.052 0.012 0.047 ~ 
Sums = 0.269 0.060 0.161 0.111 = 
Thus, 
Average Z (Py ~ Pis? (13.21) 


error or discrepancy ~ y 
NIN -1¥2 


_ 0.601 _ 99601 =0.06 
This " 
Va 
by the * sn et average error for 5 statements is In accord 
Grror of se Investigators. Hevner (1930), for example, 
for 20 stimuli when scaled by the method of paired comparison. 


with the values commonly obtained 
reported an average discrepancy or 


F 


othr ehavioural Sctences 
Voaremenl anid Research Me thods in Behavto 
Vids 


4200 Jents 

id compartson has wider applicability. In general this Method ;. 

id in pairs, either simultaneously or in syuce,.. 
—— 


Icy 


Ihe method of parr | 
er stimull can be presente | 

hod has been widely used in determination of affective values as well as of sntig 
melhod Nas | | 


The favourite stimuli for this method are colours, designs, names of personalities 
Ww il j she ” 


niries, etc. Some psychologists have also evaluated the opinions on such Westicg, ” 
CoUuNnTmHes, a | ssippecsevitecscies belts : a ee: meet tte 
rohibition, war, religion and the like by this method. In view of its wider applicability, IR iS fice 
} | 4 : f : a : J 7" 3 . = +] tie a . - 

s Lit will soon replace the rating scales where more exacting practical or experimental, 


f 


applied whenev 


needs to be done. 


METHODS OF ATTITUDE SCALES OR OPINIONNAIRES 
method of scaling discussed in the previous section can be applied to « 


Te. 


Although the : ne | | 
construction of attitude scales, a separate section for attitude scales is essential bec 


previous scaling techniques are relatively less common in attitude scaling. The measurem, | 

attitude is necessarily done indirectly. Attitude can be measured only on the basis of inferer 

drawn from verbal statements regarding belief, feeling and tendency to act towards the object ¢, 
person. Attitude scales (also known as opinionnaires), which usually consist of a large number 5 
statements towards objects of attitudes, are one such indirect measure. Here we shal/ discuss the 
three most common and frequently used techniques in Construction of attitude scales, namely 
the method of equal-appearing intervals, the method of summated ratings and the method ~: 


cumulative scaling. 


auSe the 


Method of Equal-Appearing Intervals 

We have seen in the previous section that the method of paired comparison is not well suited » 
the situation where the number of statements to be scaled is large, probably because subjects do 
not have the patience to make a large number of comparative judgements. In such a situation, the 
solution is to scale the statements through the method of equal-appearing intervals where each 
subject is required to make only one comparative judgement for each statement. In the history o' 
development of techniques for the construction of attitude scales, Thurstone (1929), Thurstone 4 
Chave (1929), Thurstone (1931) developed a method known as the method of equal-appearing 
intervals. The method, as used by Thurstone and Chave, is given below. 


A large number of statements, both favourable and unfavourable, towards the object o 
attitudes, are collected from various sources. The number of items usually ranges from 100 t 
200. Each statement is printed on a separate card and subjécts are asked to sort these printec 
statements into a number of intervals. Along with the statements, each subject is given 11 cares 
on which letters A to K are written. These cards are arranged before the subjects in a manner thal 
A is kept at the extreme left and K is kept at the extreme right. A indicates the most u nfavourabie 
interval and K represents the most favourable interval. The middle category is designated by the 
letter F, which is the neutral category. The cards lettered from G to K represent various degrees © 
favourableness and the cards lettered from D to A represent various degrees of untavourablenes 
as illustrated below: 





Unfavourable Neutral Favourable 


Thurstone and Chave defined only the two extremes and the middle category ( ol the | 
intervals) on the ground that the undefined intervals between successive cards would represe” 
equal-appearing intervals for all the subjects. The subjects are requested to sort the ge" 
statements in terms of 11 intervals represented by the 11 cards. Ordinarily, there is no time jirmil 
lor sorting. But Thurstone and Chave reported that subjects took 45 minutes in sorting 130 
statements into 11 intervals. They made the lollowing assumptions in this method: 

1. The intervals into which the statements are sorted or rated are equal. 


Scaling Techniques 353 


the attitude of the subjects does not influence 


various intervals. In other words, subjects having favourable attitudes and those having 
unfavourable attitudes would do the sorting in a similar manner, Thus the scale values of 
the statement are ince penGertt of the attitude of the judges. 

rhere is NO fixed number of subjects to be engaged in the sort 

rave used 300 subjects in sorting 130 statements. Other 
and — and Ferguson (1939) have demonstrated that 
Kenné aller number of subjects. When sorting is over, the 
with 4 a Q value of each statement. For this, 
aren shown in Table 13.19. 


le13.19 Frequencies, median and Q value of the judgements obtained by the method of 
Ls equal-appearing intervals (N = 400) 


, the sorting of the statements into the 


ng work. However, Thurstone 
investigators like Edwards & 
reliable sorting can be done even 
next step is to determine the scale 
the easiest way is to arrange the frequencies of 


Sorting intervals Scale Q 
siaterent value value 
A B €¢C BD E- FF G@ H 1 )  K 





| 2 3 4 5 6 F 8B D9 TT It 
| f 2 10 4 10 50 20 200 60 30 8 6 702: (O.525 


C, 2 12 16 26 76 96 296 356 386 394 400 


2 f QQ OQ 10 20 20 100 160 60 110 10 10 6.81 1.36 





c, () 0) lO 630.0650) 6150 210 270 380 390 400 
3 f © 2060 10: 90 100) 10 10 20 120° 10 10 5.300 2.28 


0 20 30 120 220 230 240 260 380 390 400 


4 f 4 10 20 20 10 40 82 130 60 20 4 7.61 0.99 


C 4 14 34 54 64 104 186 316 376 396 400 | | 
eens 
Subsequently, the median and Q are calculated for each statement separately. The —s 
becomes the scale value of each statement and Q indicates the extent ol disagreement i . 
‘Ubjects regarding the degree ol attributes possessed by the statement. For —— purposes, 
he details of calculation involved in estimating the median and Q value of the first statement are 
5Vven below, 


Gt) 





2 j (13.22) 
Mdn =/4+ — l 
Ey) 
a I i -F =sum of all 
- ara median; / =the lower limit of the interval in — ve ~ geo te oi 
“ENCIeS be ee F the interval be it 
inte we below | or the cumulative frequency Of the int i 


Udstj : >mern have 
“ubsttuting the values in Equation 13.22 for the first statement, we 


~~ 











32 Test. .Mewncerements ai Kexexin A Uetieeds in Beharroundl Scenic 








) %y, 
401 _o6 | 
Mean =6.5+/ — = 1 
| 200 
\ 
_ (200-96) 04) | 
65 4 200-96) 454/14 )1=65+052=702 
200 (200 


Thus the scale value or median of the first statement Is 7,02. For estimating Q, Q, and Q, are py, 
calculated first. Q, may be calculated by the following equation: om 
_(NW4-cum.?1) 

O, =f. ———— (13.23 


7 


Q, =first quartile or 25th percentile; 
| =lower limit of the interval in which N 
j = width of interval which is here assumed to be 1.00; 
ust below /; and 


where, 
A falls: 


cum -f =cumulative frequency of the interval | 
f, =the frequency of interval which contains N/ 4. 
be calculated by the following equation: 


Likewise, Q, or the 75th percentile may 











(3N/4—cum.?) | 
.<f¢i (13.24 
fs 
where, Q, =third quartile or 75th percentile; 
| =lower limit of the interval in which 3N/ 4 falls; 
; = width of the interval which is here equal to/; 
cum: f =cumulative frequency of the interval just below /; and 
f, =trequency of the interval which contains 3N/4 
Now. substituting the values in Equation 13.23 and Equation 13.24, we get 
_ (100-96) ce, «| = a 
G 8st) ee ) =0.5+001=6.52 
200 | (300 | 
= 300-296 - =f A 
Q,; =7.5+1 [> | ) =75 +1 (+) =7.5+ 0067 =7.567 =7.57 
60 60) 
Now, Q may be estimated by the equation 
Q = Q; -Q, (13.29 
? 
7.57 —6.52 | 
Q = = 0525 
ues for the 


scale values and Q val 
been written in Table 13.19. As we know, Qis ameas 
s are in close agreement with 
the value of Qwill be smi! 
g wrong with t 

biguous an 


Similar procedures have been repeated in the calculation ot 

remaining three items. These values have a 

of the spread of the middle 50% of judgements. When the judge wi 

degree of favourableness or unfavourableness shown by a statement, 

When the value of Q is large, it is a clear indication that there is somethin 
lly considered to be vague and am 


concerned statement. Such statements are genera 


_ 





the 
sult 
, the 
nall 
ine 
and 


Lm». 


Scaling Techniques 355 


soiote- should be either moditied or dropped. Guilford ( 1954, 458) has said, “Items with 
aot Q value should be omitted first.” According to Thurstone & Chave, when the different 
ape -qterpret the Meaning OF a Sakement in different ways, the value of Qbecomes larger and 
"azo, such items should be dropped. 
- ane wishes to determine the scale values and the Q values through the graphical method 
and not through the method as shown above). the best way to do it is to plot the ogive of the 
- dgements obtained for each item separately. The interval on the psychological continuum can 
- chown on the ¥ axis and the cumulative per cent frequency C, can be shown on the y axis. 
a cumulative per cent frequency is calculated by the following formula: 


Je 


C., =C,; x (13.26) 
| IN 


where. Ce = cumulative TFEQUENCY per cent: C, = cumulative frequency: and 


\)= number of judgements 
in terms of Equation 13.26, C;, for the first interval of the first item is equal to 
99 x 1007400 = 0.5; for the second interval it is (12 «100/400 =3, and so on. As a check on 
-alculation, the C;, of the 11th interval should be equal to 100. If three perpendiculars one each 
- om 25th cumulative per cent, 50th cumulative per cent and 75th cumulative per cent are drawn 
across the curve on the baseline or X axis, we will get three separate values. The value where the 
adicular drawn from the 50th cumulative per cent touches the baseline, will be the scale 
valye or the median. The value where the perpendicular drawn from 25th cumulative per cent 
ssyches the baseline will be Q,, and the value where the perpendicular drawn from 75th 
cumulative per cent touches the baseline will be Q,. The one-half distance between Q, and Q, 
becomes the Q value. 
Becides scale value and Q value, there is one more criterion used frequently in the selection 
of statements by the method of equal-appearing intervals, This is known as the ‘criterion of 
relevance’, which is based upon agreement or disagreement of subjects with the statements. After 
determining the scale value and Q value for each statement, Thurstone and Chave presented 130 
statements to 300 subjects ( not for the purpose of measuring their attitude) with the request to 
check those statements with which they agreed. Subsequently, for the purpose of knowing the 
internal consistency of the statements. it was determined how many items or statements pass the 
criterion of relevance. If the subjects who agreed with a particular statement were found to check 
those statements having widely difierent scale values, the statement was said to be irrelevant for 
the attitude being measured and hence, was rejected. On the other hand, if the subjects who 
agreed with a particular statement were found to heck the statements with similar scale values, 
the statement was said to be a relevant one for the attitude being measured and hence, included 
: the scale. Finally, about 20 to 22 statements were selected, When the choice for final selection 
hom among statements with approximately equal scale values Is to be made, generally those 
nla, are selected whose Q values are small. Persons whose attitude is to be assessed are 
: — scale consisting ot 20 to 22 statements with the request to endorse those items with which 
iethinens: ie attitude score ol the person is the mean or median of the scale values of all those 
tements ots by him, For example, suppose that the testee or lace with five 
these fy ie scale values 3.2, 3.6, 4.7, 4.9 and 9.8. Then his attitude score will be asum of 
unlavoUrable ane by the number of items, that is ee = 4.4 ee he is slightly 
ay ec nT eet tae Samir 
ing eit se ot the investigator to construc ne pa | the scale when it is 
y the method of equal-appearing intervals. 














r 


356 = Tests, Measurements and Research Methods in Behavioural Sciences 
The method of equal-appearing intervals has its advantages and disadvantages. {ts jmp 
S. Port 
nt 


advantages are given below. 
Thurstone scales enable the researcher to differentiate between larger numb 
Erg 
0 


is 
people regarding their attitudinal position. Here item weights are averaged (median) ang 
reveals a great variety of attitudinal positions. This makes it possible to make finer distinc; : 
Ctions 


among people according to the attitudes they have. 
In Thurstone scale, it can be said with increased contidence that items being used ha 


2. 
a stronger claim to reliability because they are based upon judges’ view who have higher deg, 
of agreement about items used and who eliminate the bad items reflecting little or no agreemeny 


This method has the following disadvantages: 

1. Judges or subjects do not keep the intervals equal. Fransworth (1943) has foung 
evidence in the support of the above fact. As a matter of fact, in the method of equal-appearin 
intervals there is no way through which this assumption can be tested. Thus one of th 


assumptions of the method does not stand the rigours of the experimental test. 


It is also said that the attitude of the subjects or judges tends to influence the sorting of 
1 other words, the scale values of the statement are not 


Zz 
the statements into the intervals. Ir 
independent of the attitudes of the judges who do the sorting. 

3. Thurstone and Chave have provided no objective basis for selecting the most 
discriminating items from among items having approximately the same scale values. It may be 
possible that items having approximately same scale values differ in their discriminatory power, 

4. The subjects may do the sorting work carelessly and with least interest. In such a 
situation, the interpretation of the scale values may be a difficult task. Thurstone and Chave, 

less judgements and accordingly, they can be 


have, however provided a technique to detect care 
eliminated. They have pointed out that if any subject sorts more than 30 statements in any one of 


the 11 intervals, the judgements of that subject may be rejected on the ground that he has done 


sorting either carelessly or has misunderstood the instruction. 


Method of Summated Ratings 
Likert (1932) developed a different method for the construction of the attitude scale. His method, 
named by Bird (1940, 159) as the method of summated ratings, is a simpler method than that of 
Thurstone’s equal-appearing intervals method, The main steps involved in Likert’s method may 
be summarized as mentioned below: 

1. Alarge number of multiple-choice-type statements usually with five alternatives such as 
strongly agree, agree, undecided, disagree and strongly disagree concerning the object of attitude 
intended to measure attitude towards 


are collected by the investigator. Two examples of items 


nationalization are given below: 


Nationalization improves the economy of the country. 
Strongly disagree 


Strongly agree Agree Undecided Disagree 
(5) (4) (3) (2) (1) 
Nationalization introduces a feeling of carelessness among employees. 
Strongly agree Agree Undecided Disagree Strongly disagree 
(2) (3) (4) (5) 


(1) 
2. Such statements are administered to a group of subjects who respond to each item by 
indicating which of the given five alternatives they agree with. 
3. Every responded item is scored with different weights. The weight ranges from 5 to | 
For favourable statements a weight of 5 is given to ‘Strongly agree’, 4 to ‘Agree’, 3 to ‘Undecided’, 
2 to ‘Disagree’, and | to ‘Strongly disagree’ (as shown in the first illustrative example), and for the 
unfavourable statement the order of weights to be given is reversed so that ‘Strongly agree’ 


receives | and ‘Strongly disagree’ receives 5. 


Scaling Techniques 357 


4. After the weight has been given to items, a total 


adding the weights m1 ny him on each item. Thus his total score is obtained after the weights 
a <ymmated over all the ements. since a subject's response to each item may be considered 
4s his rating of own ne eS ie — scale and his total score is obtained after all these 
weights are summated, the method is known as the method of summated ratings, 


5 Finally, selection of items is done through the procedure of item analysis. Probably, this 
step of Item analysis is the iio a Step, which distinguishes it from Thurstone’s method of equal- 
appearing intervals. As we have seen, Thurstone’s method makes no use of item analysis in final 
selection of items. There are several methods of item analysis. Edwards (1957) has suggested the 
setting of two extreme groups—high and low—on the basis of the total score and finding out the 
significance of the difference between the means of two groups by the ¢ test. The value of t will 
‘ndicate the extent to which a given statement distinguishes between high and low groups. But 
other methods such as correlational methods may also be used in place of the t test. 


in the method of summated ratings, it is Customary to select 20 to 25 statements, which 
constitute the final attitude scale. As far as possible, half of the total statements should be 
favourable so that half of ‘strongly agree’ may receive the weight of 5 and the remaining half of 
total statements should be unfavourable so that half of ‘strongly agree’ may receive the weight of 
1, This type of arrangement is necessitated to control certain response biases of subjects, which 
might be produced if only favourable statements or only unfavourable statements are included in 
the attitude scale. 


The Likert method has also advantages and disadvantages. Its major advantages are 
mentioned below. 


(i) Likert scales are easy to construct and it takes less time. This method is simpler and 
easier than the method of equal-appearing Intervals. Some empirical evidences are available to 
support this content. Rundquist & Sletto (1936, 5) have used Likert method in the construction of 
the attitude scale and expressed the belief that this method “... is less labourious than that 
developed by Thurstone.” Edwards & Kenney (1946) have made a comparative study of Likert 
method and Thurstone method and have concluded that the time required in the construction of 
the attitude scale by equal-appearing Intervals is almost twice the time required by the method of 
summated ratings. 


score for each subject is found by 


(ii) Scoring of Likert scale is easy as well. Statements on Likert scale are worded positively 
or negatively and subsequently, numerical weights are assigned to them, Subsequently, they are 
summed to yield total score. High total score indicates favourable attitude and low total score 
indicates unfavourable attitude. 

(ili) Likert’s summated ratings are the most Common measurement format. The ease of 
application and simplicity of interpretation have increased the popularity of this measurement 
lormat in social science researches. 

(iv) Likert’s method of se aling possesses sutficient degree of flexibility. Here the 
Mvestigator is free to include as many and as few items in his measure as he chooses. In this 
‘caling, because each item is presumed to count equally in measuring the concerned 
Phenomenon, increasing the number of statements increases the ability of the scale to reveal 
differences inthe phenomenon measured. 
| Despite these advantages, some disadvantages Or weaknesses have been reported in Likert’s 
method of scaling as mentioned below. | 
, fern. B to method of summating ratings, it is oss so ann femubrasl neni 
“' Weight in relation to every other item or statement, [nis Is y 


+ UMption, In fact, the different individuals may have a given attitude to the same degree, yet 
ifn 


May respond differently to different statements or items of the scale. Therefore, it is difficult, 
"Possible, to ensure that each statement counts the same as every other item. 











358 Tests, Measurements and Research Methods in Bebavioural Sciences 


(ii) The validity of Likert’s scaling is questionable. As we know, the process of deduc; 
items from an abstract universe of traits is a logical one. Therefore, there always exists ie 
possibility that some items may be wrongly included in the scale at any given time. The Problen 
is to know that we are measuring exactly what we claim to measure. m 

(iii) In summated ratings, the persons receiving the same score on a Measur 
necessarily possess the trait to the same degree. This obviously means that this method 
precise as it claims to be. Its raw scores may be regarded as crude estimates at best. 

Despite these disadvantages the method of summated ratings, as devised by Likert, has been 
successfully used in assessment of attitudes. Recently, this method has also been fruitfully sed in 


assessing socio-economic status, intelligence, interest. and special skills ‘Black 3 
Champion 1976). 


Guttman’s Scale, or Cumulative Scale 


Guttman’s method of scale analysis or scalogram analysis differs considerably from the two 
methods of attitude scale construction discussed previously. The Guttman scale is based upon 
the methods of cumulative scaling and has been defined by Guttman (1950) as follows: 


“We shall call a set of items of common content a scale if a person with a higher rank than 
another person is just as high or higher on every item than the other person.” 


Thus, if a set of statements with common content defines the Guttman scale 
higher score or rank than another person on the same set of st 
higher than him on each statement in the set. For example, 
perfect Guttman’s scale: 


, a person with 
atements will rank consistently 
the following items illustrate the 


(a) My height is more than 5’ (b) My height is more than 5/3”. 
(c) My height is more than 5'8”. (qd) My height is more than 6’, 


A person who responds with ‘Yes’ to item (d), will also be responding in ‘Yes’ to items (a), (b) 
and (c). All the four items are measuring the same dimension. that is, height and constitute what 
Guttman (1944,1945) called ‘unidimensional scale.’ similarly, it a set of attitude statements 
measures the same attitude, they are said to constitute a unidimensional scale or a Guttman 
scale. According to Guttman, one advantage of the unidimensional scale is that from the total 
score of the person one can reproduce the pattern of his responses to the set of statements. 
Suppose, for example, that in the above example, ‘Yes’ is given a weight of 1 and ‘No’ is given a 
weight 0, then knowing that a person has secured a total Weight of 4, we can say that he has 
responded ‘Yes’ to items a, b, ¢ and d. Likewise, if a person gets a total weight of 3, he has 
responded ‘Yes’ to item a, b and ¢ and ‘No’ to item d. Such prediction regarding the perfect 
reproducibility is true in a perfect Guttman scale only. In case of attitude, statements showing 
such a perfect reproducibility are rarely achieved because some degree of irrelevancy is always 
present. 


The major steps in the Guttman scale may be enumerated as shown below. 


1. A large number of statements are collected regarding the object of attitude. All 
statements seem to indicate the same attitude. This constitutes what Guttman calls a universe 
items. 

2. Out of these collected statements, a small number of items are selected. Usually, the 
number of selected items does not exceed 20. According to Guttman, the selection of a smal 
number of items from the large number of possible items is dependent upon the intuition an 
experience of the investigator. These selected items must be of homogeneous content. Thus one 
should look for items in the Guttman scale which are, to a greater extent, the rephrasings of the 


same content. Guttman believed that item analysis was not an essential part of scale analysis f0! 
selection of items as we find in case of the Likert’s scale. 


Ds 


ww“ 


Scaling Techniques 359 


. etatement may have two alternatives such as ‘ 

}, Each ataie ; . witraland 5) — such as ‘Agree—Disagree’ or more than two 

pernatives ahi as ABreey NENT At ANG Le rsa ior All these items are administered to a group of 

f a han ) a gre 

go perso who 
All items are score 


ed for each person. 
the total score, each subject is ranked from highest to lowest and is listed 


On the bas!s of 
y row indicates the responses of a subject to different iterns (see Table 13.20). 

6. subsequently, it 1S determined whether or not the responses to each itern are in close 
indicates the quality ¢ ING nee an ancien consistency with the higher total scores and 
rking the response category (such as ‘Disagree’ or ‘Neutral’), which poorly indicate the 

ed should show consistency with the lower total scores. If this is reality, the 
, homogeneous one and in this case from the person total score (or rank) alone, we 
a ‘e his response to each item. In a perfectly homogeneous test, the index of 
reproducibility will also be pertect and therefore, the coefficient of reproducibility will be one. 
~ a case of pertect reproducibility has been demonstrated in Table 13.20 where responses of 
10 subjects towards five ‘tems have been displayed. Each item has two fesponse 
categories—Agree’ and ‘Disagree’. The response category ‘Agree’ is scored with +1 and the 
other response category ‘Disagree’ is scored with 0. Subsequently, an attempt Is made to evaluate 
the scalability of the items and for this purpose, there are several procedures. 


respond (9 each item. 
d or weighted and a total score by adding the weights on all items is 


| aterm in 


5 


a column. Eacl 


adopted by Guttman (1947). The method 
proposes to answer with this technique Is 
onal. Following this technique, a 
f each statement one column Is 


We shall here describe the most common method 
called the ‘Cornell technique’. The question Guttman 
dimensional or multidimensi 


whether the set of statements is UNI 
able is constructed in a way that for each response calegory © 
row is constructed (see Table 13.20). Starting with the 


constructed and for each subject one 

highest total score, We place a cross mark in the appropriate cell of the table for indicating 

responses given by each subject towards each item. Thus the table makes available all the 
Table 13.20 Perfect Guttman scale of 5 statements responded by 10 subjects. +1 indicates 

favourable judgement and 0 unfavourable judgement 

| Statements 









Subjects d 

+] 
A 

| x 

d x 

3 x 

4 

5 

6 

/ 

6 

) 





ae 





360 Tests, Measurements and Research MV, He ig ? 
Research Methods in Behavioural Sciences 


| responses of all subjects along with the total 


| | scores. Knowing the total score Of a subject 
reproduce his responses to each of the 


t hekauieadieal ther items. An inspection of Table 13.20 indicates th 5 Hy 

3 ve statements. Therefore, his score is five. Subjects 2 and 3 have ¢..../* 
responded. Subject 4 has responded “disacree’ ta | sa But has diesdivi Simi an 
fiems. Subleet 5 has sell Isagree to item number one ut has agree with other, 

he A > has responded disagree’ to item one and item two and has agreed to 

peri. Pinay iy 2 ; ia responded ‘disagree’ to the sis pe _ has ei 

| ‘disagree’ to the first fc - ?) Nas responded ina similar manne hE ubject 8 has respong 

| ot » Your items and has agreed with the last item, that is, the fifth item, Hence} 

| total score is one. Subject 9 has earned the score of zero which means he has disagreed With al 

| the five items, When the items form a perfect scale and vary in graduated fashion in intensity 

| tems |, 2, 3, 4, and 5 in Table 13.20, Guttman has named the resulting scale a perfec 
reproducible scale which obviously means that a person’s response pattern to all items in g sat 
can be perfectly reproduced simply by knowing his total score on the scale. Such perfec 
reproducibility of scale can’t be achieved by either the Thurstone method or the Likert Methog 
and hence, this may be considered as one advantage over these two scales. 

In practice the ideal model of perfect scale is rarely shown, and therefore, it becomes 
necessary to examine the degree of reproducibility for the set of given attitude statements attey 
making allowance for the errors. For this purpose, Guttman suggests that.a cutting point should 
be located in the table. A cutting point may be defined as that point where the most common 
response is shifted from one category to another in the statement. Locating a cutting point 
sometimes creates difficulties. To ward off these difficulties, Guttman (1947) has given two points 
as guidelines. First, the cutting point for each statement should be located at a place where the 
error is minimised. Second, the cutting point should be located in a way that “no Category should 
have more error in it than non-error.” In case of a perfect scale or perfect reproducibility, the 
responses above the cutting point fall in one category and the responses below the cutting poin 
fall in another category. This is clear from data presented in the Table 13.20. The cutting points 
have been shown in Table 13.20 with horizontal lines. Now we are presenting another set of data 
in Table 13.21 for the purpose of illustrating a nonpertect scale. The cutting points tor each 
statement have been shown by horizontal lines in the Table 14.21) which displays the score 
matrix for 6 statements, each having two alternatives and answered by 10 subyec ts. In this table, 
the cutting points have been located for each statement acc ding to the two point suppestion al 
Guttman. The sum of each row of the score matrix yields the total score tor each subject and these 
have been recorded in the last column, The first row at the bottom of Table 13.21 shows the 
frequency with which two Calepories of oes h statement have been responded, [he sums tor each 
column of the response Category of | have been divided by the total number of subjects 
responding to this response Category lo obtain Proportions p lhe proportions of ihe subject 
giving 0 responses will thus be equal to l= p = q. The values of p and q have been shown in the 
second row at the bottom of the score matrix. Finally, errors for each response calegory have 
been counted and written at the bottom row of the score matrix. Here, errors indicate response 
inconsistencies. For the response Category of 1 of the first statement, no response falls below the 
cutting points and hence, the Stor IS Zen whereas lor the response category of 0 one response 
falls above the cutting point but it should theoretic ally tall below the cutting point and hence, one 
error has occurred. Theretore, ] has been written in the row of error against this category column. 
For the response category of 1 ot the second statement, one response falls below the cutting point 
but it should theoretically tall above the Cutting point and hence, one error has occurred. No 
response falls above me — Ng Point in the response category of 0 of the second statement and 
hence, no error has ponaicineinet similarly, error has been counted for each statement. The error for 
each statement has been added together which is equal to 8. We have a total of 60 responses (thal 
‘s 10 x6). The one of ren i therefore, equal to 8/60 = 0.1 33. Subtracting this value from 
unity, that is 1, we get the coefticient of reproducibility, which indicates the per cent accuracy 


wz 





scaling Techniques 301 


‘ which the ee lO Various Statements Can be reproduced. Thus, coefficient of 
" “od ibility = 1 0.133 =O867 = 087. Thus. the equation for coefficient of reproducibility 
pocrome’s , 

Coefficient of reproducibility = 1 number of errors 


| (13.27 
number of responses 3.27) 


Table 13.21 An illustration of the Cornell technique in the Guttman scale 


hs 3 4 e wis 6 Total 





f 6 4 5 5 / 3 + 6 6 | 6 + 


a 


According to Guttman, if the coefficient of reproducibility is below 0.90, no cumulative 
scale is said to exist or if it is between 0.85 to 0.90, a ‘quasi scale’ is said to exist. Thus. for 
Guttman, the coefficient of reproducibility must be at least 0.90 for constituting the 
cumulative scale, 

Guttman scale has some advantages and disadvantages. Its major advantages are as follows: 

(i) The Guttman scale clearly demonstrates the unidimensionality of items. Such 
unidimensionality is assessed neither by the Likert scale nor by the Thurstone scale. 

(ii) The unidimensionality and scalability of the scale enable the researchers to identify any 
consistencies in the responses and probable untruthful replies given by the subjects. 

ii) In the Guttman scale, the person’s response pattern can be easily reproduced with a 
knowledge oft his total score on the scale. This type of advantages is not found in case of the 

ustone scale or the Likert scale. 
| lire Researchers have shown that the Guttman technique of scaling s iad pet to use 
fel ; humber of dichotomous items (such as agree-disagree) Is |? or less than that. When the 
Mber of lems exceeds ] 2, the technique becomes cumbersome. 

The Guttman scale has, however, some disadvantages . The major ones are given below. 

") d to those items which have three or more than three 
sponse 
Nation, 


The Guttman scale is not well suite Saket | 
Categories. Although some psychologists have applied this scaling technique in suc ha 
IV is Very cumbersome and tedious to proceed with this technique. 


ya 





362 Tests Measurements and Research Methods in Behavioural Sciences 
362 = | 


The Guttman scale can't be used appropriately in situations where the Number... 
is more than 100. In such scalogram analysis ion Ne. 
~ a 


(ii) 
ey 


exceeds 12 and the number of subjects 
error determination will prove to be a Herculean task. 
ii) The Guttman scaling technique does not provide as extensive continuum fo, wh 
we find in the case of Likert’s and Thurstone’s scaling technique. ng. 
Despite these limitations, the Guttman scale has been successfully used — 
assessment. Besides, the technique has also been used in opinion studies of a political, ie 
and social nature. It may also be used in combination with Likert’s and Thurstone’. Min. 


encompass a wide variety of attitude assessment. 


Relationship between Attitude Scale and Measurement Scale 
There is a defined relationship between different attitude scales and measurement scales Iti 
very important to know the link or relationship between attitude scales and measurement scale 
because this will help in interpreting the scores on attitude scales. This relationship May be 
explained as under: ) 
(i) Likert scale is based on ordinal scale. 
(ii) Thurstone scale is based on interval scale. 
(iii) Guttman scale is based on ratio scale. 





VM Review Questions 


Describe and compare the scaling techniques of rank order and paired comparison. 
Critically evaluate the advantages and disadvantages of the equal-appearing interval 
method. 

3. Discuss the general procedure involved in a scale construction. 

Discuss the contributions of psychophysical methods to the development of 


im 


psychological research. 
5. Describe the steps involved in developing a Likert type of scale to measure attitude 
towards family planning. 
6. Explain the procedure of scaling by the method of paired comparison, 
7. Discuss (i) Method of constant stimuli, and (ii) Guttman scale. 
What do you mean by a unidimensional scale? Outline the steps involved in 4 
unidimensional scale as suggested by Guttman. 
Describe with examples, the major steps involved in construction of a scale by method 
of equal-appearing intervals. 
Compare and contrast the method of summated ratings and the method 0 
equal-appearing intervals. Which of the two do you consider better for developing 2 


opinionnaire’ Give reasons. 


10. 


11. Write short notes on: 
(a) RL and DL 
(b) Guttman—Cornell technique 
(c) Psychological methods 
(d) Steven's power law 


Part Three 


PRINCIPLES OF 
RESEARCH METHODOLOGY 





14 
SAMP LINC 





caning and Types of Sampling re PREVIEW —., 

rn" probability Sampling Methods 

B) Nonprobability Sampling Methods 

(C) Mixed Sampling 

weed for Sampling 

’ ryndamentals of Sampling 

principles of Sampling 

, Aims of Sampling 

, Factors Influencing Decision to Sample 

, How Large Should a Sample Be? 

, Methods of Drawing Random Samples 

, simple Random Sample 

» Stratified Random Sample 
(A) Proportionate Stratified Random Sampling 
(B) Disproportionate Stratified Random sampling 

» Area (or Cluster) Sampling 

» Quota Sampling 

+» Purposive or Judgemental Sampling 

* Accidental Sampling 

+ Snowball Sampling 

+ Saturation Sampling and Dense Sampling 

+ Double Sampling 

+ Mixed Sampling 

+ Requisites of a Good Sampling Method 

' Common Advantages of Sampling Methods 

* Sampling Distribution 


* Sampling Error 


MEANING AND TYPES OF SAMPLING eatin the behavioural SOS 
Nearly all researches—experimental and nonexperin’ ycation—Aa™ some ale 
Particularly in the fields of psychology, sociology el of some select ara 
*zarding a well-specified and identifiable grOUP OP TY rion or unlve® OO rechical 
"ell specified and identifiable group is known as 2 neralized CO” an ay identifiable 
nu of pe b ic known as 4 sample: The 6¢ may defined 4 al all 

rsons or objects Is KN | . eharefore, | le nile 
an “as the statistical jerome A population 3 pee nol teachers: * om et 
Wye specified group of individuals. All po of populatio’™ easily coun - 

sj e exam | can 


sity. 
Ot inf 'y students, all housewives, etc., 4! 


) m 
| | cihere all tne me 
“~a finite population is one where 4 


365 





Vy 


366 Tests, Measurements and Research Methods in Behavioural Sciences 


unlimited and therefore, its members cannot be Countey 
is an example of a finite population and the population val 
pulation because the former can be counted where she. 

4S the 


population is one whose size is 
population of university teachers 
ina river is an example of an infinite po 
latter cannot be counted. 


Likewise, a population may be real or imaginative—a real population is one which 


exists and an imaginative population is one which exists only in the imagina 
psychological and educational research, on many occasions, the population is imagi 
measure based upon the entire population Is called a parameter. 

A sample is any number of persons selected to represent the population ACCOFdINg to 5, 
rule or plan. Thus, a sample is a smaller representation of the population. A measure baseq en 
a sample is known as a statistic. n 

Before we define the different types of samples (or the methods of sampling), it is essential 
define the term ‘probability’, which is the base of sampling theory. The general meanin y 
probability is less than certain and for which there exists some evidence. In sampling theary ‘ 
term ‘probability’ is used as equivalent to the relative frequency. Thus when we say ths i 
probability of a tail on a single toss of a coin is 1/2, it is meant that when we make several bea 
the relative frequency of a tail will be about 1/2 or 0.5. If one says that the probability of having 
of a male child is 0.8, it is meant that on previous occasions the relative frequency of the bine 
a male child has been 0.8. Probability may be expressed in terms of a fraction or jn 
decimal numbers. 

Following Blalock ( 

(A) Probability Sampling Methods 

(B) Nonprobability Sampling Methods 


A discussion of these two is given below. 


actual, 
on, " 


1960), most sampling methods can be categorized into two— 


(A) Probability Sampling Methods 
Probability sampling methods are those that clearly specify the probability or likelihood of 
‘aclusion of each element or individual in the sample. Technically, the probability sampling 
methods must satisfy the conditions given below. | 
(i) The size of the parent population or universe from which the sample is to be taken, mus 
be known to the investigator. 
ii) Each element or individual in the population must have an equal chance of being 
included in a subsequent sample. : 
(iii) The desired sample size must be clearly specified. 
lf, for example, a researcher knows that the population which he is going to study cont 
500 elements, or individuals, and if he knows that all the elements (or individuals) are accessible 
and may be included in a subsequent sample, it can be said that each element (or individual) !" 
the population has an equal chance, that is, 1/500 of a chance of being selected. This constitutes 
- probability sampling method. In practice, however, sometimes researchers are not able (0 
sade for one ae! that conditions (i) and (ii) will be satisfied. Sometimes the population studied ' 
= as to < considered infinite and unknowable for all important and practical purpose 
ai point of the probability sampling method is that the obtained samples fe 
seipealiaetiae didinieeor wane net ihe conclusions reached from such samples are ad 
ieee ) J aan to similar i eeu to which they belong. ia 
sees : point of the probability sampling method is that a certain amount of samp" ® 
r exists because the researcher has only a limited elerr , ‘on. samp!" 
Sivomveclre el eal coraeaenely y a limited element of the entire population. 2¢""' <( 
the parent povislition Th © which the sample characteristics approximate the character'st* 
- The smaller the sample, the greater the sampling error. 


ains 





2 


the major probability sampling Methods are the 


|, Simple random sampling ‘ollowing. Ding 36 
- stratified random sampling 
aes Proportionate Stratified random — 
(b) Disproportionate stratified random ne, a 
3. Area or cluster sampling sampling 
8) Nonprobability Sampling Methods 
Nonprobability sampling is one method IN Which the 


| re lS no w 
jonprobability sampling met Ods are those that Provide n — IN the sample. 
characteristics ofa sample approximate the Parameters of th isin iMating 
had been obtained. This is because Nonprobability ; e 
sampling. Important techniques of Nonprobability sq 
. Quota sampling 
. Accidental sampling 
. Judgemental or purposive sampling 
. Snowball sampling 
. Saturation sampling 
. Dense sampling 


amples don’t use ei the sar 
: ec lal : 
pling methods are, UES Of randorr 


na wm Se WwhMe — 


(C) Mixed Sampling 
e Systematic sampling 


Figure 14.1 presents the details of Classification of sampling. 


Types of Sampling 
Nonprobability sampling 
Cluster 
sampling 
(eo 
| Proportionate Stratified | Quota 
sampling | |__sampiing 
"'sproportionate stratific, q | Accidental 
Sampling = |__Salripang 


—— = 


- 
Probability sam lin 
Probability sampling | 


=e ad _- random 
— Sampling —aenlitg 

















| 
il 


| Mixed sampling | 

ar 
, 

systematic sampling 









Snowball 
sampling 
Dense | 
sampling 
Saturation 
sampling 


Fig. 14.1: Details of types of sampling 


Judgemental 
or 
purposive 
sampling 






NEED 
FO ' aT) 
‘ampli " ~AMPLING t asons are mentioned be | od 
S'SNeed d f the important fee" Ling is COMPIESS 
li) s S for a variety of reasons. Some 0 Jpased On SAMPT'S | teria 
“Mpliy 7 ch study 3 sed upon othe! 
NB Saves ¢; | oney. A reseal dy based Uk ; ik 
— Withj 5 time as well as money nditure than a study ducted by tr ined 3 
iii) 4 “Sser time and incurs less expe erally congue” and testing 
Ry search Study based upon sampling '9 Ber cy in measurelne 
PErie : des accuracy © 
Nced e 


'Nvestigators. As such, it prov! 


368 ests, Measurements and Research Methods in Behavioural SCwence 
e researcher to estimate the sz 

= DC Sarr). 
n regarding some characte::.. 


om 


—_ ae 
7. 


cause it enables th 
etting informatio 


(iii) Sampling is also needed be 
and that way, ! helps in 8! 
population. 

(iv) Sampling |S needed because 
contains infinitely many members. 


(vy) Sampling helps in making correct and scientitic | 
which generalization s to be made after the completion 


FUNDAMENTALS OF SAMPLING 

e fundamental concepts 
ental concepts are U 
ng distribution, stafl 
aning of universe/PoP 
e in this chapter. Con 


a 


it remains the only way when a populat 
aily: ;.. 


udgement about the pop). 
ofthestudy. 


ca = 


related to sampling that need to be exnl; 
ae oer c 


niverse/population, sailing WAM, SaiviBline ce 
stic(s) and parameter's), confidence = 
Lis a 


Of 


There are som 
important fundam 
sampling error, sampli 
cance level. The me 
n explained elsewher 
aining fundamentals. 


“lation, sampling error and sampling distri, 
sequently, we shall discuss he weasc 


signifi 
has bee 


only the rem 


(1) Statistic(s) and Parameter(s) 

4 numerical value base 
ation. For example, w 
bes the characteristic 
ameter because It des 
estimate of a paramete 


_ 
= 
7 


as parameter is the numerical \- 
hen we calculate mean from a sample, this js an: 
s of a sample. When the same mean ts _ = 
cribes the characteristics ot population , 
r from a Statistic. 


d upon sample, where 


Pe 


[ bs 


A statistic Is 
based upon popul 
statistic because it descr 
from population, it 's called par 
sampling analys!s aims to obtain the 


(2) Sampling frame 


The elementary units form the 
units. A list containing all such sampling units 's Ca 
said that the sampling frame consists of a list of items from hich sampling ts to be finally drawr 


In some cases, it becomes impossible to draw a sample directly from a population. As such, th: 
frame is constructed by the researchers for the purpose of the study or it may Consist of som 
existing list of population. Fo an use a telephone directory as a fram 


r example, the researcher ¢ 
for conducting an opinion sur 


and such units are called samp): 


a 


basis of the sampling process 
lled a sampling frame. In other words, it cante 


vey in a partic ular town. 
(3) Confidence interval and Significance level 


interval is the expected perc entage Of limes suipulating that the actual value“ 


cision limits. For example, if we take a contidence interval of 95%, then 
there are 95 chances in 100 thal the sample results present the true 
ation within a spec tied prec sion range against 5 chances in 100 the 
e interval indicates the likehood that the answer will fall within te 
el. on the other hand, is the level that ndicates the likehood thé 
5% then the significance level | 


el is 100 -99, thal: 
e within 


The confidence 
fall within the stated pre 
obviously, it means that 
characteristics of the popul 
it does not. Thus confidenc 
precision range. Significance lev 
the answer will fall outside that range. if the confidence level is 9 
100 —95 = 5% or .05 and if the confidence level is 99%, the significance lev 
1%. tt would not be out of place to mention here that the area ot the normal curv 
precision limits for the specified confidence level constitutes the acceptance region, and the at 


of curve outside these limits in either direction constitutes the rejection regions. 


(4) Sampling design 

By a sampling design, we mean a plan for obtaining a sam 
= it refers to the procedure the researcher would adopt in 
rom which inferences about the population is drawn. A good sampling desig 


following characteristics: 


frame. In ON 
pling ynit 


ple from the sampling | 
sess Int 


selecting some sam 
n must pos 


—— 





0 ulation. 
le des! n sho | . 
g uld be such that it may resul 
tin small 
samoli 
Mpling error. 
| 


) 
(iii) TNE sample design must allow for controllin 
iv) The sample design must be usable in light i a bias in the samt 
iy) The sample design must be such that the us unds available for Min 
general for the population with a reasonable pr sample study Sintess ie 
" pac eth ty 
yPsed in detail later on inthischapter. obs 


camp! ing. Te 
iNCIPLES OF SAMPLIN™ 
les of sampling that guide the theory of sampling. These pri 
. [Nese principles are 


There are three pr incip 


Entative of the 


(i) 
The samp 


e 


(i! 


1. Inmost cases of sampling, & 
population parameter and this is attributable to the selection of units in the sampl 
ew nance | ne 
Jo explain this principle in a better way, let us take an example. Suj ple. 
dividuals and you know their age- . Suppose there are five 
individuals Age (in years) 
A 20 
B 22 
( 19 
D) 20 
_ 24 
Total = 105 
}Q5 
Aepan age =e” 21years 
| 
Now lel US supp! st jurthet thal you want [0 select d sample of two persons for making an 
ese five persone (pr ypulation mean Of parameter mean). If you adopt 
hinations of two: AB, AC, AD, AE, BC, BD, 
¢ each of these pails. 


pape ol (| 
e com 
(or mean) age 


dl 
an be 1Q possibl 


estimate v! the me 
the theory of probabilit) there! 
BE. CD, CE and | yf. Now fetus alculate the average 
Difference between sample 
mean and population mean 
. be age 22 = 42222 0 
2. A+C=20+19 39 2 =19.5 -1.9 
3 A+D=20+20 = 40/4 = 20 “i 
4. A+ b= 20+24 = 44/2 =24 
5 B+C=22+19 =492= 20 0.9 
6 B+ D=22+20 = 42/2 =2' 0 
7 B+ E=224+24 = 46/2 =23 2 
bapa -1.9 
- 19 +24 = 43/2 =215 0.5 
1 


9) 


I 
0. D+E=20-4 914=-44/2=< 


—_ 


370 Tests, Measurements and Research Methods in Behavioural Sciences 


Now look at the average ages calculated on the basis of the sample of 
(sample statistics). If you make comparison of these averages with mean ki one (Wo INdiy; 
individuals (population mean, that is, 21 years), you will find that ole Of the or) 
combinations, that is, AB and BD, there is no difference between the mean s, Case of 
Of the Samp| 


) : In case of remaining eight combinations, there js qj 
difference is known as sampling error whose size varies markedly. This analysi erence Th 
the first principle of sampling stated above. YSIS Well 
2. The second principle is that the greater the size of sample, the more accy ; 
estimate of the population value or parameter. (ate Will be 4 
fs 


This principle of sampling obviously relates the size of the sample with the 
ulation mean. To illustrate this principle, let us carry the above Uracy r 


estimate of the pop 

further. Now let us take a bigger size of sample, that is, of three individuals | Xamy 

‘ndividuals and see what happens. There can be seven possible combina Of ty, 
f five individuals. IONS of thre: 


individuals, out of a population o 
Difference between sample 


mean and population mean 


1 A+B+C=204+22+19 =61/3 = 203 ~0.7 
2 A+B+D=20+22 +20 =62/3 = 206 0.4 
3. A+B4+E=204+22+ 24 = 66/3 =22 1 
4. B+C+D=224+19+20 =61/3 = 203 0.7 
5 B+D+E=22+ 20424 =66/3 =22 1 
6. B+ C+E=22+19+24 = 65/3 =216 0.6 
0 


7 C+D+£=194 20+ 24 = 63/3 =21 
Now, make a comparison of the differences when the sample size was of two individuals y 
the differences when the sample size Is of three individuals. Difference between sample oe 
and population mean in case of sample size of two individual lies in between 0.5 to 2.00 wheres 
in case of the sample size of three, this difference is reduced and lies between —0.4 to -0.7. Thy; 
‘t is clear that with enhancement in sample size, the sampling error is reduced and accuracy i 


estimate to population mean Is enhanced. 

3. The third principle is that the greater th 
population for a given sample size, the greater wil 
the difference between sample satistics and true population parameter. 


This principle clearly states that when the population is heterogeneous OF markedly varies 


with respect to the variable under study, there will be the 
le to illustrate this. Suppose we know that of the 


and population true value. Let us take an examp! did 
five individuals, the age of A is 20 years, that of B is 30 years, that of C is 25 years, that of Dis 
years and that of E is 45 years. Then the average age of these live individuals who markedly diffe 


in terms of age will be 


e difference in the variable under study ina 
| be the sampling error, that is, greater will be 


Individuals Age (in years) 
A 20 years 
B 30 years 
C 25 years 
D 40 years 


45 years | 
Total = 160 years 


Mean age = = = 32 years 





r- 


ampling 371 


7 cample of two indivi 
ke a Seo | lividuals, their rear Age and difference Ir 7 ' 
j > eye) me Tle 


+ (42 years) will be as under 


Differences hetween sample 
mean and pr ulation mean 


1 Ad B= 204 1 = 50/2 =25 z 
» AFC 90425 = 45/2 =22:5 @.¢ 
» A+D=204 40 = 60/2 = 30 ; 

4 A+E=20+ 45 = 65/2 = 32.5 0s 
6 B+C= 10425 = 55/2 =27.5 45 
6. B+ D = 304 40 =70/2 = 35 | 

° B+E=304 A5.=75/2=375 55 
g C+D=25+ 40 = 65/2 =32.5 05 
9 C+ =254+45 =70/2 = 59 3 

10, D+ E=40+ 45 = 85/2 =42.5 10.5 


If you look at the difference, it varies from 0.5 to 10.5, thus producing a bigger sampling 
error. This happens because of the fact that the individuals in the population differ markedly in 
term's of variable under study, that is, age. When they were more or less of the same age, as we 
fad in the example of the first principle, the range of difference was in between 0.5 to 2.00 only, 


hus showing a lower degree of sampling error. 


AIMS OF SAMPLING 
There are certain aims in selecting a sample. These aims are: 


(i) To achieve maximum precision in estimating about population values within a given 


sample size 
li) To avoid bias in selecting a sample 
tion is that bias in selecting a sample occurs in the following situations: 


The common observa 
pling is influenced by 


(a) When sampling is done by nonrandom method, that is, when sam 
human choice, the bias occurs. 


b) When a section of population refus 
(c) When the sampling frame does not cover the whole population 


ed to cooperate or somehow is not available 


FACTORS INFLUENCING DECISION TO SAMPLE 
takes some decision regarding the sampling plan at the time of 


Any behavioural researcher 
gn. His decision to sample is influenced by at least three factors: 


formulating the research desi 
1. Size of the population 
2. Cost involved in obtaining the elements 
3. Convenience and accessibility of the elements 


These three may be discussed as follows: 


|, Size of the Population 
Decision to sample is directly ‘fluenced by the size of the popul 
say, for example, it consists of 300 individuals, the investigator May decide to include 
study and, therefore, sampling may nol be done. On the other hand, if the size of the population 
IS large, say, it consists of 10,000 individuals, he may decide to select a limited number of 
individuals from the population of 10,000 ‘ndividuals. Obviously, then it can be said that as the 
larger, sampling becomes increasingly very 


ation. If the population is smal! 
all in his 





hy Isat Fi Tigi Whetice 
- iv Praryait) Metlvrade ivy Ite 

, commnenh red mrory 

4 2 Texte few 


- a7e of the population jx a rel yt) 
Halse be mentioned here that the ™ 
ant. Hmay also be myer Mes ,, 
imyportan NAN cing An 


| ear 
nvectivator regards as a large population, the other may regard it as . ’ 
core yyy . | 


Y te 4 yh hor) ric fi 7 ‘i ‘ey 
g distinction between large population an, ‘mall, 
clearcut guidance exists for making dist all Pip, 


Vr ? 1) ie wil¢ 
2 Cast Involved in Obtaining the Element 


he Fh tipy, 

‘i; 5 

| Meer tt nd i 
| f lin r : Mig 

to sample may be postponed, On the other hand, if sampling involve: a Cems 

iD tay 1) ¥ ; t 


Ah, 


The iny eshigator is aleo influenced by the cost likely to be ine urred in Obtaining the 
7 ~~ - j " ' : 
the population. If sampling involves a bigger cost which the inve Stigator can 


| in | atly t tale], 
investigator can readily meet, the sampling work is greatly facilitated 


3. Convemence and Accessibility of the Elements 
The decision to sample is also influenced by the convenience and accessibility 
Sometimes the investigator may have to deal with a problem with respect to wh 
may not be conveniently available. For example, if the problem deals with IVEStipating 4 
causes of premarital sex relationship among college girls, the Investigator may ny, Tony 
convenient to trace out such girls who could readily tell about their affairs OPENTy. In sich it 
sampling may be faulty. On the other hand, some investigators may have ACCESS 10 facifiy.. 
staff where a large amount of data could be easily handled. These investigators dare sample 
in those cases where the resulting data are complicated and complex. 

Obviously, the decision to sample effectively is influenced by the size of th 
anticipated cost of the study and the convenience and accessibility associated 


HOW LARGE SHOULD A SAMPLE BE? 


I is often asked by student, “how large the size of sample 
it depends upon many factors. In fact, 
depends upon three factors. 

(a) Degree of accuracy required 

(b) Degree of variability in the population 

‘c) Number of different variables examined simultaneously in data analysis 

Other things being equal, the researcher needs lar 
accuracy, if the population has a great deal of hetero 
variables in the data analysis simultaneous! 
accuracy is acceptable, when the popul 
few variables only to be examined at a ti 
size is that if the population is small, a bigger sampling rat 


Larger sample permits smaller sampling ratios 
population size grows, the ret 


wr 


-ce 


© PODUlating ‘. 
with the Ceergns 


be in my study?” The best answer 
a good researcher's decision about the best samp 


. =. 
a 


= 


1G Sirs 


ger samples if he wants higher degree » 
geneity or if he wants to examine may 
y. Smaller samples are considered good when ks 
ation is homogeneous or when the researcher wants: 
me. In general, the best principle of determining sang 
10 has to be for an accurate amet 
for equally good samples because as ® 
urs in accuracy for sample size shrink. Let us take an example. * 
small populations (under 1,000), a researcher must have a larger sampling ratio (that is about & 
least 30%). Thus, a sample size of about 300 is required for a high degree of accurac. 
moderately large populations (10,000), a sampling ratio of about 10 per cent is needed © 
achieving equal accur acy or 4 sample size of 1000 can be accurate. For larger populations, o* 
160,000) a smaller sampling ratio, that is, | per cent May yield equally accurate resulls 8 
sample of 1600 can be accurate. These are, however, approximate sizes and practical limlade” 
Such as Cost, time, etc, also play a role in taking a decision by the researcher, 

Till the second decade of the 20th century, statisticians believed that samples shouldbe 2 
50 that the normal probability table could be easily used to estimate sampling error as met 
by Central limit theorem, However, the research work of Sealy Gosset in 1915 clearly pre’ ty 
data on the probability distribution of 





wa myxAing 375 


al, sarnples of 30 are usually considered large sarnples 
ettect VE USE: han 30, are considered small sarnples. At sarge size of 30, the ‘imntlenin od 

tener ! a cmall samples COME nearer to the Z critical values of the normal probabi| 7 
es los that are larger than necessary may also produce srobleine like 


sles. SaMP | ne 
\| as administrative and logistic problem. 


; small camples. In gener 


able for a wil 
rable yired cost 45 Wwe 


dence Interv 
ne basic ‘1 inferential 
the basic 
- ctatistics calculated from ¢ sample. oe: 
m station ™ <, it is also becoming important 
ch to hypothesis testing. 


als and Confidence Limits 
statistics is the estimation of parameters of populatior 
eae ‘ ) he 
Traditionally, this has been very important in surve 
* = 7 
in experimental research and can ey en 
In tact, this problem involves two 


Conf 


arch DU in recent year 
| alternative approd 


blems. | 
gint estimation 
estimation 
searcners estimate parameters using Si 
Thus in point estimation, 
ides an estimate of the population param 
that if the population mean is know 


ngle sample value, these estimates are 
a single sample value drawn from a 
eter. The question is: how good such an 
n to be 80, would a sample mean of 60 
sbout sample mean of 50, 75, 100, 105, etc.? Under what 
good? Since the researcher knows that population 
we generally use samples to estimate these 
¢ error the researcher is likely to make? 


(b) interval 
e re 


When th 
int estimales. 


ood estimate’ What 
e consider an estimate 
gre virtually never known and that 


parameters, is there any means to determine the amount of errs 
at this question |5 to put another related question: how much do means from a 


way [0 et 
peer ee of means V4 ry? One indication of such variation is to compute the standard deviation 
of the distribution of means. Like any standard deviation, how much scores (here sample means) 
nthe distribution typically vary. Since the interest O aking an estimate sf 
the population mean, any variation can be easily taken as an error. 
The researcher can also estimate the range of possible values that he confidently believes 
‘ncludes the population parameter. In other words, we can also estimate the range of possible 
means that are likely to ‘nclude the population mean. This is known as interval estimation. Social 
scientists generally prefer interval estimates Over point estimates. Let us illustrate the meaning of 
‘aterval estimation with the help of day-to-day example. Suppose you are forced to guess the 
weight of a man based on physical inspection. Since you don’t know the population value (the 
man’s true weight), you are likely to base your impression on his close physical inspection. YOu 
may say that the man weighs 60 kg. If you are asked, “How confident you are that he weighs 
exactly 60 kg?”, you would probably reply, “| doubt that he weighs exactly 60 kg. However, | feel 
reasonably confident that he weighs between 55 kg to 65 kg.” In fact, in doing this, you have 
estimated the weight of the man in terms of interval between 55 kg to 65 kg. One important point 
maha here is that the greater the size of the interval, the greater is the feeling of certainty that 
eysethests : encompassed between these limits. The interval within which we consider the 
enable is called as confidence interval and limits defining the interval are referred to 


as contidence limits. 
MN | : 
ow let us look at a sample problem a 


{ the researcher Is inm 


nd apply our statistical concepts to the interval 


est é : 

a, ‘dan confidence levels. Suppose mean of a sample is 120 with standard error ot mean 6. 
44-1 CP ai os if the researcher randomly takes a mean from this distribution, _ oe a 
happens bec ity that mean will be between 120 and 126 (one standard error above i - ; 
deviation cause the distribution of means is a normal curve, standard error !5 } standarc 
on that curve and we know that 34% of ve is between mean and | 


standard deviat; a normal curve 4% 
iation above the mean. Likewise, the researcher could also reason that another 34° 





374 Tests, Measurements and Research Methods in Bebaviou ral Sciences 
and 114 (1 standard error below 120). Putting these two togeth 
ether 


should be between 120 
9 126 that he is 68% confident to include population m 
Can 


researcher has a range of 114t 


sample was randomly taken from this 
upper confidence limit here is 126 and the lower confidence limit is 114. Normally, how 
Ihe 


-esearcher wants to be more than 68% confident about the estimates. The standard pracy 

use 95% or even 99% as confidence intervals. For the 95% confidence interval, the “at IS to 
wants an area in normal curve on each side between mean and z-score that includes 47 sees 
urve table clearly shows this to be at 1.96z. In terms of z-scores, the 95% - iy The 
interval is from —1.96 to +1.96 on distribution of means. If we change these z-scores to iv Iden, 
in the above example, the lower limit would be (-1.96)(6) + 120 =-11.76 + 120= 108.24..08 
upper limit would be (1.96)(6) + 120 = 11.76 + 1 20 = 131.76. This shows that the true po and the 
mean for the given sample is between | 08.24 and 131.76. Likewise, tor 99% confidence im 
the researcher uses the z- score for the middle 99% of the normal curve. This comes to be nterva 
Changing this to raw Scores, the 99% confidence interval will be from 104.64 to 135.36 + 2.56 


METHODS OF DRAWING RANDOM SAMPLES 
of the vital points in sampling, particularly, probability sampling (Peatm, 
1947). Randomness means that all elements in a given population have an equal i: 
independent chance of being included in the sample. There are several methods through which 
the investigator maintains randomness in his sampling plans. some of these methods are 
given below. 

1. Fishbowl Draw Method 

_ 2. Using Table of Random Numbers 
3. Method of Computer-determined Randomness 


A discussion on these methods is given below. 


population. We call it 68% confidence interya| I the 
an 


d the 


normal c 


Randomness is one 


1. Fishbowl Draw Method 
This is a very simple method through which the elements trom the population can be selected 
randomly. In this method the entire elements (or individuals) of the population are numbered on 
slips of paper of equal size, colour, etc. All these slips are folded in one and the same way and are 
put in a container or bowl. After mixing the slips thoroughly, the Investigator in a blindfolded 
position, selects one number at a lime until the desired sample size is obtained. 

Although this method is very simple for random selection of samples, it has some limitations. 


(i) This method can’t be applied where the size of the population is large. It would be a 
cumbersome and tedious task to number each element of a p ypulation of, say, fifteen thousand 
cases. Moreover, mixing them in the container ot bow! will also pose a big problem. 
tii) This method of random selection is too simple a method to be considered a 
one. Here the investigator may select some numbers of slips from the bowl while purpose 


excluding the others. 


s a scientific 
fully 


2. Using the Table of Random Numbers. 


Forra : — 
ney my pore the table of random numbers is considered appropriate, 
numbers which don’t ap “ is humbers consists of a continuous row-column seque 
frequently than the stig aa a me Particular sequence, nor does any number appeal more 
number of elements in the po ie USING ine table of random numbers the researcher specifies is 
number of elements in the Ee, and then, numbers them from 1 to N where Nis the tota 
select 50 cases randomly id oe Suppose the population size is of 500 and he intends 1 
move systematically to the righ is population. He would enter the table at any point. He may 

ight, left, up, down or diagonally, skipping the numbers that are 10° 


easy and 
nce ol 


—arpaan yr 


that have already been drawn. (This fatter 
% i (4 fry Tal P 
Id thus keep On MOvINg systematically through the mn 
f 50 elements. BN the table 
lection of 30 cases from a population « 
ra 


ei = 
- i € 


e 


ref) 
. i Re fy 


4lsO thos 


(ge " ily.) Fe ample o 
[ae jon ndom samp 


rially Nutneras! tr ws 1 
f am | Zz | i 


-Ca- a fe 
ot the dom se 
“ject Ie 14.1, 9 art Ive numbers have been omi 
7 Tap” . tr yted. Twelve mbers Nave been omitted. Number. e+ 
I illus’ oe they exceed 80 and numbers 03, 74 and 12 h ¢ B4, 97 and 95 have 
a> becal . = : . : Have Deen cep, ) 
gv h | . previous selections. Ni ' mitted because 
tn aq eaee OE TT . umber 00 has been omitted because 1 
| tarts from O1. wee RST 


Table 1 








25920 17696 





84122 





64752 : 
pel 59289 77436 34430 38112 

auid 98495 51308 50374 66591 

0012 55605 88410 34879 79655 

-_ IN = 30 has been taken from a serially numbered population of 80) 

3 . 59 28 12 85 55 
. a 97 71 03 60 
i“ 17 74 01 74 
: - 36 79 66 
85 7 ” 7 " 

84 U2 = si ° 
12 02 we . * 


2 OL 


The advantage of using the table of random numbers is that it is easily accessible to the 
researchers and requires no formal training in using it. However, the disadvantage is that it cant 
be easily and constructively used when the size of the population exceeds 5 digits. 


e Maton of Computer-determined Randomness. 
a is used when the size of the population is large. The data Is fed in the eng 
imei number of elements corresponding to the elements In the pope a 
e that this method is easier to adopt when computer facilities are available. 
O these three methods, the use of the table of random numbers is more popular for ensuring 


‘andomness ; 
ess j os ji 
n the method of selecting a sample. 


SIM 
e) may be defined as one 


simp| 
Mm sample (also known as an unrestricted random samp! | —_ 
an equal chance o! being included 1" 
f iW 


ach ; | : 
Me samp ‘ ae Every individual of the population has he selection 
ie in Wider the selection of one individual is in no Way dependent upon the el 
On tas "sisting a Ps example, if we are to select a sample of 10 ee ai 40 student 
SUbse arate Slips ry students, we can write the name (or roll number) ? a : ~sinilat Was 
Uently, the... PaPer—all equal in size and eolour—and fold them: in 2 er on 
oSY May be placed in a box and res nr 


Which 
hn 


huffled thoroughly. ¢ 





~ 


——=— 





376 = Tests Measurements and Research Methods in Behavioural Sciences 


ere, the probability of each slip being Selectegy ; 
the name written on the slip, he again ret S Tag 
p being selected is again 1/40, By i ti 


d to pick up one slip. H 
lecting the slip and noting ve 

ili he second SI 
he box. In this case, the probability of t ne | 
i aie the first slip to the box, the probability of the second slip becomes 1/39 Wh 


— : ation after being selected, it i } 
element of the population Is returned to the ue la § tis called samp) 
with replacement and when It 1s not returned, Ng 


it is called sampling without replace 
Sampling with replacement is wholly feasible except In certain men where it is seldom a 
(Cochran, 1963, 19). lf sampling with replacement Is used, the chance of the same ne i 
selected more than once !s increased. In such a situation, the repeated cases may be ignored iF 
is used in a5 j 


making a selection of cases. Thus random samp} 
may be defined as one In which all possible combinations o! samples of fixed size have an ms 
probability of being selected. | 
The major difference between sampling with replacement and sampling li 
ber of possible samples of size n that could 


replacement is mainly concerned with the num = Ol 
placement, the number of possible Samples i 
ot 


theoretically drawn. In the case of sampling with rep! ! . 
size n would be greater than the number of possible samples of the same size (from on 


in case of sampling without replacement. This difference can be illustrated throyg 
the population consists of 4 persons, who are named as A.B c 
investigator wants to select samples of size 2 through he 
acement. In such a situation, the investigator can maximally 
of 4. This could be accomplished with the help of 


then, may be aske 
Suppose that after se 


population) 
an example. Suppose the size of 
and D. Further, suppose that the 
procedure of sampling without rep! 
draw six samples of size 2 from the population 
the following equation: 
N N! 
( ~ (N—n)in! V4) 


where, N-=the size of parent population 
n =the size of the sample 
| =factorial 
In the above example where N = 4 and n= 2, the maximum number of sample size of 2 


would be 6 as under: 


3! 5x4x3x2xl 120 


1) 


(5-2)! 2! Mg | RZ] l2 


But from the same population, we can have 5 samples of size 4 as under. 


5 5! 5 x4x3x2x1 120 


= J 





4° (5-4)l4! 1x4x3x2x1 24 


ion _ + eens has decided to proceed with the technique of samp! 
replacement, he can derive the likely number of s | itn t 
me | samples from the tion wil 
of the following equation: ples from the given popula 


Nn (142 
(wh ing of is identi | 
ERTS at Nand nis identical to the meaning in the case of Equation 14.1). he 
ppose the size of the population is 4 and the size of the sample is 2. In such 4 tua 
fa 


investigator, f ‘ine t chni 
gator, following the technique of sampling with replacement, can maximally 





> 


otis, 42 =4 x4 =16. If th 
les, that : e four mer Sine 
ant teen samples of size 2 would be embers of Population “mot 37 
F ar 
{ AA AB © Named ac A,B,C ana 
BA BB AC 
CA BC AD, 
CB C BD 
‘ BB, CC and 
the case of WA, © na DD combinat DD 
an element or individual once d ations ref] 
placements 9 r individual once drawn can be drag m2 that in samp 
cases are IBMT” Srawn again. In actual ae with 
ae _ - : ; ractic 
There annem pes and disadvantages of simp| = 
savantages of simple random sampling are given ahead ple random sampling. Th 
| ; peor TNE major 


1. Asample prepared on the basis of simple rando : 
epresentative of the population from which it w wide sampling plan is re | 
rep ts in th as drawn. This is because j garded as the 
lan, all the elements in the population have an equal and j —_ In such a sampling 
included in the sample. A sample drawn in such a way that does i ane chance of being 
elements to be included in the population, rather increases or Ho ani equal chance for all 
element being included, the resultant sample is called a biased — the likelihood of an 

9. Insimple random sampling, the investigator n , 

rat , ene need not know the true iti 
gpulation beforehand. Such a sample theoretically reflects all important mao ma of the 
segments of the population. it Characteristics and 

3. With a view to understanding and application, simple random sampling is the easiest 
and simplest technique of al probability sampling plans. | 
ampling serves as a foundation upon which all other types of random 


4, Simple random s 
ethod of sampling can be readily applied in conjunction with 


sampling are based because this m 
all other probability sampling plans. 

5. Insimple random sampling, the sampling err 
can easily be assessed. 

6, In simple random sampling, 
because he need not know thoroughly the 
sample. Classification error means the error which results 
population characteristics or segmenls. 

However, simple random sampling has ‘ina does not ensure that 

a ing does not. 
ntages Is that simp 1 led in the given 
I| numbers In the population | — i 
ation. Suppose, I 4 population © +00 
estigator 1s t0 draw only 
ossessing the tra! 


or associated with any given sample drawn 


5 not commit classification errors 
o selection of the 
lassification Of 


the investigator doe 
population characteristics prior t 
from the improper ¢ 


also some disadvantages a° given below. 


hose si One of the major disadva 
ample ce which exist in sma 
persons eps Very aa limit 
: possess the X trait and the INV 


INVect) 
ropa wants to include some persons P — 
Nthat such persons would be drawn. hat it does n° fully explor © 
knowl, Another disadvantage OT simple random aris of the P pulatio lation 
on the investigator has concerning = e ss es all females !" “hieh SES and 
Ple, the j , portion ° belonging Oe wi 
OSs. nve ws the prop 3 ons belv"s ‘Je drawing « 
Portion of grad ime r der juste proportion of per : information while dre 
a ais - ans he cant utilize thes® item ing draw" aatét 
‘amp ns in population, etc., ol ox aniggtet 
' , ' . ots! 5; 
: his mars the quality of representativene> asamp) 0 e ol Oe 
. om In the case of simple random sampling, | a tified 5 greal ee 
: . i i f a 
: Pared wi ; , jncurre In random sal" e sal le al | 
Ne nti a sarnpin | ; “ tional to some 


: sIZe_ 7 i. sate j in : O 
ame j This is because the heterogenel'y in stratifie or PP 
it is MO! 


n 
beg ; . case of a stratified rando™ samP ‘ pecause 
Omewhat typical of the populatio 








an el - a rad " ‘ral Sch TT “ rg 
search Methods 11 Behariot 
ments and Resear } 


+ 


4 78 Tests, Measure 
rt ati I fact 1s igno ed in simple ! all dom sAaMp 4 


known Cc haract 
r increases. 


the sampling erro : 
Despite these aig : 

assignment of elements ran om bs 

experiments as well as in raising a fan 


1 ! mpling the population is, first, divided into two or more Strata, whj 
in stratified random samp’ 6 as sex, yielding two strata—male and female ae 


‘nole criteri ch 
be based upon a single criterion suc oe 
h as sex and graduation, yielding fo 
5 y § four strata, namely 


combination of two or more criteria suc! | 
male undergraduates, male graduates, female undergraduates and female graduates, Thee 


divided populations are called subpopulations, which are nonoverlapping and topethg 
lation. Having divided the population into two or more Strata Which 


constitute the whole popu : | 
are considered to be homogeneous internally, a simple random sample for the desired number i 


taken from each population stratum. Thus in stratified random sampling, the stratification of 
population is the first requirement. There can be many reasons for stratification in a population 
Two of them are mentioned below. , 
1. Stratification tends to increase the precision in estimating the attributes of the Whole 
population. If the whole population is divided into several internally homogeneous units, the 
chances of variations in the measurements from one unit to another are almost nil. In such 3 
situation a precise estimate can be made for each unit and by combining all these estimates, we 
can make a still more precise estimate regarding the population. 
2. Stratification gives some convenience in sampling. When the population is divided into 


simple random sampling has been preferentially u 
y to different experimental conditions inp Psychol, 
m sample for generalizing the obtained findings OPical 


several units, a person or group of persons may be deputed to supervise the sampling survey in 
each unit, Or, the possibility is that the institution conducting the sampling survey may have field 
branches to supervise the survey in each part or unit of the population. 

Stratified random sampling is of two types. 

(A) Proportionate stratified random sampling 

'B) Disproportionate stratified random sampling 

Adiscussion of these two types of sampling plans is given below. 


(A) Proportionate Stratified Random Sampling 

_ fa seige i oe i this sampling plan the researcher stratifies the population according 
es known characteristics of the population and, subsequently, randomly draws the 
ndividuals in a similar proportion from each stratum of the population. Suppose, the investigate’ 


knows the classwise distributi ) ; 
fTabsle 14.2) sé distribution of population of the students in a university as given below 





Table 14.2 Classwise distribution of a population of 10,000 students 





Cla | 7 
» Composition of population Proportion of each class 
oe Se eee 

N, = 3,000 0.30 

BA 
N; = 4,000 0.40 

MA 
N, =2,000 0.20 

M Phil. 
N, = 10000 / 1.00 





: ; Seump) 
if the investigator has decided to draw a és pling 379 


Now, mple of s 
; ants from each stratum in similar ou PRose, 1.09 on 
umber os se ii opulation ine ; fl: Proportion. Thus for di ns he will Include the 
ach stratum of the pof 4 proportionate way, he wil| os thr Tawing INdividuals from 
- Number of students from 1A Clase UBM a procedure ag verte 


= 10009 #30 = 400 
= 1000 » 49 = 400) 


= 1000 «29 - 
Number of students from M Phil. Class = 19009 a5 : 
MO=109 


a 
| £=1000 
The proportionate breakdown of a sample of 1,000 students has been 


Table 14.3 A proportionate breakdown of a sample of 1,000 
Composition of a population : 


Number of students from BA Clase 
Number of students from MA Class 


shown in Table 14,3. 
Students 
Proportion of each Class 






N, =300 
lA ' 30 
Nome 
BA » = 400 4 
N, =201 
MA 3 =200 20 
a ee a ‘ 
a 
N, = 1000 00 


EEE ee 


Having decided the proportion of each stratum in the sample in proportion to the population 
stratum and the number of students to be included in each stratum, the investigator goes on 
randomly drawing the students in each stratum until the desired sample size is reached. This is 
easily done with the help of either a table of random numbers or any suitable technique 
of randomness. 

: sr eeoonale stratified random sampling has many research applications. It is readily and 
asily applied in military studies where ranks are likely to affect other variables in the research 
aoe such sampling is considered important in the study of school systems and 
ns rative services where the population remains divided into many strata. To make the 

mple representative of the population, it is essential to have a proportionate representation. 
a Proportionate stratified random sampling has some advantages and disadvantages. The 

portant advantages are given below. 

lage fer raonate stratified random sampling increases rept % 
' representativeness is strengthened because of the fact that this 


ensur . 
th °s that those elements that exist in a few numbers are also included prop 
€sample, 


the =f Proportionate stratified random sampling, the sampling error ts i tion. 
' ple drawn possesses all the necessary characteristics of the parent popula ate tie 
é Proportionate stratified random sampling eliminates eae sid ° i oti is 
according to. their original distribution in the population. ainee t wetphted 
ately Stratified according to several salient features, the sample 


De ally. The Weight of any particular value is the frequency of Il a 
| Spite thaca aeanhart ri AMPA 
of disadvan ‘ae advantages, the proportionate stratified random samp & 


eon AGEs as indicated below. 

* Proponin, fe 
Ihat the i Portionate stratified random sampling |s 2 
Mling opel knows the composition and distribution 

“atts, This is often an unrealistic assumption. 


the representativeness of the sample 
method of sampling 
ortionately in 


nimized because 


population. 
has some limitations 


This method assumes 


is a difficult method. vell before the actual 


of population \ 


a Researe? Methods tr Behavioural Sctences 

ae amenes an 7 
acts, Measure 

380 Te 


3 time-consuming method because the samples are drawn ae 
iss “Cord 
Ih 


is method Is tion. 

Bi i oft each stratum in popula ification error. Sin | mlb 
1 hs oil hod has the probability econ anneal di ice the INVEStipator ; 

z ibs strata, he Is likely to classily i oo Of the POpUlatign titey 
to identify ae him into the Wrong strata. This wil! alfect the validity and reliabiliny SUrh 
a way as lO P : of the, 
obtained results. _—" | 

atifiet dom Sampling 

, ionate Str atified Ran ey eee . 

(8) Dace stratified random sampling bears similarity with the pro 
car on | - ti 
ee ce sampliiig, THe only difference is that the substrata of the drawn iy Mion, 
ified random ser : eignt i rE r ra 
stratified d according to their proportionate weight in the population from w FE gy 


necessarily distribute 
were randomly selected. In 
some underrepresented. Sam 
(a) either the investigator wi 
ib) he will give greater repr ; 
substrata of the population in the sample to be drawn. 

Suppose the investigator divides a given population of 10,000 individuals into 6,000 ms 
and 4,000 females. If he has decided to draw a sample of 1,000 individuals from the set on! i 
snd if he draws randomly both the males and the females in equal number, say, 500 a a 
constitute the example of a disproportionate stratified random sampling. But if he random 
draws 600 males and 400 females in his sample, it will constitute the example of Proportionay 
stratified random sampling. From this example it becomes obvious that in the disproportiona 
stratified random sample the investigator tries to give equal weight to each stratum, that - 
tries to draw equal number of individuals from each stratum. In Going so, he overrepresents ong 
stratum while underrepresents the other strata. In this example, when he randomly draws 599 
males and 500 females, he is overrepresenting a female stratum and underrepresenting 


fact, some of the strata of the population may be overrepres 
pling disproportionately means that “Nted 
I| give equal weight to each of the substrata, or 


esentation to some substrata and not enough Weight to he 
at ' 


male stratum. 

Disproportionate stratified random sampling has some advantages and disadvantages. jt 
major advantages are indicated as follows: 

|. Disproportionate stratified random sampling is comparatively less time-consuming than 
proportionate stratified random sampling because here the investigator is not worried about 
making proportionate representation of each stratum of the population. 

2. In disproportionate stratified random sampling, the investigator is able to give weight to 
the particular groups of elements that are not represented as frequently in the population as 
compared with other elements. 

The major disadvantages of disproportionate stratified random sampling are given below: 

1. In this method of sampling, certain stratum of the population is overrepresented and 
some other strata are underrepresented in the samples drawn. The assignment of greater weigh! 
2 we - of elements or overrepresenting one stratum of the population may introduce some 

as in the sample. Such type of sample may not be a true representative. 
aie 2. Wee stratified random sampling assumes that the investigator knows the 
Osi : en See ; 
rarinee Walle ieee i This automatically means that this method of ane 
er | tUation where the investigat jl mposition 0 
original population. gator has no idea about the comp 


3. In this method of sampli 
7 ampling the re - — at 
different substrata and fr pling the investigator is required to classify the populati 


drawing a sample fro pear substratum, he takes more or less, equal number of cases: 
io i mn such a classified set, he may actually misclassify certain elements, that® 
nent \ into stratum X whereas A might belong to stratum Z. 


into 


HP on CLUSTER) 
aneA iuster’ samplin 
yea ha its origin ID 


ae 2 limitations, the LS of disproportionate ste Kf Sampdin, 35) 
nite (Ne? ss. In fact, t : diilted + 
pesp nce researches. ORSeuSe Of ease and cony fandom Sampling 
cial oe atified random sampling, SMIENCE it is more © COMMON 
_ 60 ate SU ~  VSFE Doel... ot 
ine gjon4 POPular thar 


g is another important method 


the field of agriculture. Farm; F 
) Lind 5 aes Farming xperiments th. ling. Such —_ 
eth” | the effect of various Kinds Of fertilizers, soj| treatments at Were conducte : 
ine yield, mostly used this method of sam SNG a Variety of sa to 
= ine 


Pling. In SOci 
; , “7 rr a = ial * 
n extensive In survey research ‘ SClences 
bee and field research,  @Dlication gj 


- generally, geographical divisions of territa, 

- gic., are made ona; map and a certain number of 

states: The ‘investigator or interviewer proceeds to interyi 

ee clusters. That is the reason why this method 
re 


ne 

sampling: -_ . 

se the investigator wants to assess the attitude of the 
O 


Of probahit; 
elu Probability samp 


meth pling as 


gsall 4 
fe in area amp! 


EW alll elements of 
i sampling is al 


he randomly 


cae known as 


it Wi People of Tamil 
is conve 
ning, For this, it will be convenient for the INVestigator to have the map 


amily Pl and then divide it into various sections according to a numbe 

f | | x . 
perore ad lines drawn across the total area. He will then number each sect 
Al 


-nntal ml it 
al tO the total number of sections. With the help of the table ot rando 
being | 


Nadu towards 
of Tamil Nady 
TOF Vertical and 
On trom 1 to N,N 


~— raed 
draw 4 specified number of sections to constitute the sample that he will ede 
a May. The 


estigator Will, then, interview all persons or members of families living in those sectio : 
inve 7 eeation contains extremely different types of families, again, a random sel om Ifany 
ay Sagres dom select 
dra ng those families can be done and finally interviewed. In this way, according t ion from 
among vm ee _ | thee O Necessity 
Lt nd selection of samples can be i Ve i 
further subdivision : Gone at different stages. This is called 
ulti-stage sampling. 
As given hereunder, area sampling has many research utilities or applications 
(i) Large-scale surveys of politi -al, religious and social behaviour are easily conducted by 
area sampling. 
(ii) Where somehow lists of specific individuals are unobtainable or are inaccessible, area 
sampling becomes the best method of sampling. 
(iii) Public opinion polls are easily and smoothly conducted using area sampling, 
Area sampling has some advantages and disadvantages. The important advantages are 
given below. 
1. When larger geographical areas are to be covered, it is easier to use area sampling than 
other method of probability sampling. It is easier in the sense that the investigator need not 
a j : * ye ' : —— a = 9 
= the list of the individuals inhabiting a given area. He simply draws some geographical 
sect Té i seated a nt “sah 
Ps randomly; and, subsequently, he interviews all the families or persons living in the 
‘andor a ae ener | asealinn 
bie y drawn sections. In this sense, area sampling is easier than the other bvo preceding 
lOds of probability sampling. 
2. In 


he same area sampling, respondents can readily be substituted for other respondents witht 


the indivi andom section. This is permitted because clusters of elements ate sainpled ant dot 
Individuals. =) DEC AUSE C 


3 
* Area samp! No ARO 


IN One Specific se INE saves hoth time and money. The investigator can Cont ORL aaa 
S5ION and. tnus, can save time. The cost ol such wltnpling He IVILIETY RSS 


€ othe mele ot , fed not eavel yet 
Inher methods of probability sampling. The investigator need Nak lawel y 





us and Research Methods in Behavioural Sciences 
sets Mecsurennens 
3g2 Jesh, 


distances to interview specie 
region. 
4, Areal 


individuals residing at random points in a certain : 
“OBra 
‘Phie 
4 


or cluster) sampling possesse> the trail of done Ina multi-sta 
design, the investigator can successfully ns Beep ze sampling in several Ply 

BN, xample, after drawing several sections randomly in the first stage, the " Cease 
usar ie to stratify each of the first-stage units and select sections from some “iy 
one the second-stage units. The investigator may use a third type of sampling o strata 


; ; AN jn, " 
third or fourth stage. _ . 
5 Still another advantage ot area sampling is that the respondents ~—— 
substituted for other respondents within the same random section. This further here a be 
degree of flexibility in the area sampling- | | : 
Area sampling has some disadvantages too. The important disadvantages are sine i 


1. In area sampling, the 
much less fruitful or dependable in comparison to other methods of probability sampling Pling 


2. In area sampling, there is no correct way to ensure that each sampling unit incly 


3n area sample will be of equal size. In fact, here the researcher has little control over the 
each cluster. This introduces bias into the samples. 


3. Inarea sampling, it is also difficult to ensure that the individuals included in one Cluster 
are independent of other randomly drawn clusters. It may be that an individual is interviewed jy 
one cluster and next morning he travels to another area that falls within another randomly 
selected section. 


Be deg Sa 


ded jp 


SiZe gf 


Despite these disadvantages, area sampling is acommon and popular method of sampling 
in behavioural researches. 


QUOTA SAMPLING 


Quota sampling is one of the important types of nonprobability sampling methods which is 
apparently similar to stratified random sampling. In quota sampling the investigator recognizes 
the different strata of population and from each stratum, he selects the number of individuals 
arbitrarily, This constitutes the quota sample. 


Suppose the investigator knows that the population of the individuals he 's going to study has 
three strata in terms of SES (socio-economic status)—high, middle and low. Further, suppose he 
knows that there are 1,000 people in high SES, 7,000 people in middle SES and 2,000 people in 
lower SES. Thus, the population consists of 10,000 ‘ndividuals. If he wants to select 1,000 
individuals and, finally, selects 100 individuals from high SES, 700 from middle SES and 200 from 
lower SES, according to his convenience (and not randomly), this constitutes quota sample. 


It is obvious that there is similarity between quota sampling and proportionate stratified 
random sampling with the only difference that in the former the final selection of individuals © 


“ . am ; 2 H iy 
not random, whereas in the latter the final selection of individuals is random. Keep!ng in i 
this similarity, quota sampling is sometimes referred to as ‘the poor man’s proportionate stratil 


: ; : . nd 
sample’, because such sampling seeks to ensure that specific elements are included 4 
represented in subsequent collection of them. 


a ; = rae 
Quota sampling has some advantages and disadvantages. The major advantages 0! 4 
sample are stated below. 


. 
je i Quota samples are the most salistactory means when quick and crude results 
red, | 


7 


This ing | | | 
samplin h method of sampling is convenient and less costly than many anes wan 
sampling, whether probability or nonprobability. | 


Quota sampling, to a greater e Sampling 393 
E| ‘ 


xtent, can 
é Can gu ; 
ata of population. buarantee the INClUsion of in 


ctf : Aividual. ¢ 
erent 5 : a Aividuals § 
a owevel quota sampling has some disadvantages algo a5 given h ” 
ling, there is no ea Ven belo: 
y. 0 ue ae ietaneabatid of - ®stablishing randomn 7 
‘ ‘ O e : O55, / 
jes remaln n S dioabilt of the population, The *S. AS such, the selected 
camp al validity OF generalizabilily. Conclusion, therefore, lack 
tern i a . 7 . r dl 5 
7 3, In quote sampine ce Merete Or Interviewers get amp| 
; accessible individuals influencing their friends and relatives ney to select the 
most aC nallves. Such 


duals may not be typical of the population they are B0i 


5 Quota sampling 15 amenable to classification error 
‘gator bases his classification of respondents on the way | 
90 <sesses NO knowledge concerning the way respon 


F emains ignorant 

- vestigator t Es that might otherwis | 

rassifyiNg them. All these tend to make quota sampling less dependable and rliabh aaa 
y, quee sampling, the researcher, to a greater extent, control ” 


P $ One vari 
etc., but he can t control other variables that may have both remue on ‘tical 
practica 


ng to study readily accessible 


Here the interyi 
Newer or the 
a apparently look to him, In 
ents should be Classified. The 


cance This mars the dependability of quota sampling. 
5 


Despite these limitations, quota sampling is a popular method 
ethods of sampling, because It enables the 
Mm 7 


research plan. 


»URPOSIVE OR JUDGEMENTAL SAMPLING 


purposive sample, a kind of nonprobability sample, is one which is based on the typicality of the 
cases to be ‘acluded in the sample. The investigator has some belief that the sample being 
handpicked is typical of the population or is a very good representative of the population. A 
purposive sample is also known as a judgemental sample because the investigator on the basis of 
his impression makes a judgement regarding the concerned cases, which are thought to be 
typical of the population. For studying attitudes towards any national issue, a sample of 
journalists, teachers and legislators may be taken as an example of purposive samples because 
they can more reasonably be expected to represent the correct attitude than other classes of 
persons residing in the country. Before the start of general elections, purposive samples are often 
taken in an attempt to forecast the national elections. The investigator selects the persons from 
those states whose election results on previous polls have approximated the actual results and 
thus, have been typical of the whole population. 

Purposive sampling has some advantages and disadvantages. The important advantages are 
given below. 


among nonprobability 
researcher to introduce a few controls into his 


1. Since purposive sampling does not involve any random selection process, It Is 

somewhat less costly and more readily accessible to the investigator | 
2. Purposive sampling is a very convenient method of sampling as COmP 

Methods of nonprobability sampling. 

Wee aoe sampling guarantees that those om: 

elevant to the research design. The !n¥ estiga 

Methods of nonprobability sampling. 

Purposive sampling has some disadvantages also 45 


i} . ensul 
— In purposive sampling, there is nO Wey icality of the 5 
“sentative of the population despite the belief in typica 


This intett.- er 
INNibits his ability to generalize the findings. 


rad to other 


| be included in the sample 


eadividuals wil 
individua guarantee In any other 


oes not get such 





teand Researel) Methawds t! Bebacionwral Sciences 
Tests Heasenenren ‘Sd 
gaa 7e5ts 
+h emphasis is placed on the ability of the ; 


e [ype al of population and which are ny 
he sampling. Once he has selected 4, 
ethod will be minimal, but ac 


ympling, too Muc 
dividuals are’ 

subjectivily In | 
his selection M 


2, In purposive * 
assess which elements wi 44 
ample scope [ol introduc ig 
assumes thal errors arising “ions: 
‘ am te stand tor arguing his POsttte | vextiss ee 
legitima osive sampling the inferential statistics can’t be used | 


a (ase | pul ti 
2 In the case of purp ~itimate, 
nonprobability sampling method. 


Ve 3 GOEe eee 
er all inferential stat MNegs Th, 
4 


because, und [ spate 
plies to other forms OF 


criticism also ap 
ACCIDENTAL SAMPLING 

ntal sampling, also known | 
nonprobability sampling plan. It refers l 


as incidental sampling, \s another popular 
o a sampling procedure in which the investig 
the persons according to his convenience. Here es ape eer Consists of peo 
willingly available and suitable to take part and are : onsets target 
Pennington (2002) has called such sample ae Sppanunn) "i f B- | ere t e investigate na 
not care about including the people with some acne oF vin trait, rather he is ay 
guided by convenience and economy. Heiman ( 925) has hae it as convenience sampling 
This is a crude method of sampling, and the investigator knows that little can be generalized te 
the sample thus drawn. 
An investigator may take 
of that class happens to be his frie 
Accidental sampling has some advantages and disadvantages. The important advantages are 
given below, | 
1. Accidental sampling is the most convenient method of sampling. 
2. This method of sampling possesses the trait of economy. This method saves time, money 


Method Gf 
ator se 
Pe Who arg 
POpUlatign 


Acciae 


students of class X into the research plan because the clas. leache 
nd. This illustrates accidental sampling. Mer 


and labour of the investigator. 

However, accidental sampling has some disadvantages as well. 

i. From accidental samples, nothing can be generalized with confidence because the 
samples remain no longer representative of the population. 

2. In accidental sampling, the investigator gets ample opportunity to show his bias and 
prejudice in selecting the individuals. As such, this method of sampling is not much dependable. 

3. In accidental sampling, the probability of sampling error is high. Therefore, the validity 
and reliability of this method are badly affected. 

Despite these disadvantages, it will not be an exaggeration to say that in many psychological 
and sociological researches this method of sampling is frequently used. 


SNOWBALL SAMPLING 


Snowball sampling, which is a nonprobability sampling method, is basically indirectly 
sociometric. It Is a process of selecting a sample using networks of friends and knowns. It is 
defined as having all the persons in a group or organisation identifying their friends who in tum 
identity their friends and associates until the researcher observes that a constellation 0 
Iriendships converges into some type of a definite social pattern. Some selected behaviour is 
usually used as the basis of contact and/or association. Obviously, then, snowball sampling is 
sar 2 cueing an impression of informal social relations among individuals. In fact ne 
ight roar a a hidden’ population whic Ae pe: 
ctpuegeoemacsi tard i: he 7% hardened crimina Is, prostitutes, os biect and 
carn hin, he abate ‘ie: - ae ‘ii e researcher identifies one potentia : Ie fl 
resses of other subjects and from them, he obtain 


subjects so that the growth of the sam 


e snowball sampling SUCCessfully ee lO build oer 
, ¥ of di 


h rT * . 
OW physicians come 


ball sampt; 

pl 

| 

locate the Physicians’ he pattern of diffusion 


Pr gormale | 
» infor vealed. It may an ‘iauca? 
forma IS ‘be ae "Art oe SUCH as: Do the h Cliques through which 
-, journd eo 7 mae Medical SeMinar? If : Prysicians read ahr, 
ruse n 4 convention, whom do they contact among their te hear about itin label 
ica 


seni matiOn about a given drug spread among physician Eee Bs regarding it? How 
34, and sq : 


all sampling has important research appli on, 


jpstrial organisations where 1 is expected not to exc 
jn au 


anient tO the studies of social change and diffusion of; 
7 : ; = 
reocial organizallon>: 
snowball sampling has some advantages and disadvantages The import 
. s, The ant advantages 
are 


1. Snowball sampling, which is primarily a socio 


important and is helpful in studying a small, info 

eae organizational structure. 

0 . 

2. Snowball sampling reveals communication pattern 
concepts like community power; and decision-making can also b 
sampling technique. 

3. The method of snowball sampling is amenable to Various scientific sampli 

or ocedures at various stages such as use of random numbers or Computer determination we 

Despite these advantages, snowball sampling has some limitations also as described below 

1. Snowball sampling becomes cumbersome and difficult when N iS large or say it 
exceeds 100. 

2. This method of sampling does not allow the researcher to use probability statistical 
methods. In fact, the elements included in sample are not randomly drawn and they are 
dependent on the subjective choices of the originally selected respondents. This introduces some 
bias in the sampling. 

Despite these limitations, snowball sampling is used in the study of diffusion of some 
specific information and social changes. 


er cad th 
c used t . Sa 
1957) “cians. The PUrpose was to determine 


ch as drugs and related supplies, Snow 


od a Besides, such sampling can alice: 


informati | 
on 
lO use Certain medical 


Cc 
als 





8 
ae sampling technique, has proved 
mal social group and its impact upon 


in Community organization 
e studied with the help of such 


SATURATION SAMPLING AND DENSE SAMPLING 


Coleman (1959) has emphasized these two types of sampling techniques which are used less 
lrequently as compared to other techniques of sampling. Saturation sampling is defined as 
drawing all elements or individuals having characteristics of interest to the investigator. Drawing 
all physicians having at least the age of 45 (from a small community) would be called saturation 
sampling. Dense sampling is a method of sampling which lies somewhere between simple 
random sa mpling and saturation sampling. When the researcher selects 50%o or more from the 
population and takes a majority of individuals having specified traits or character'stics which are 
o! interest to him, it is called dense sampling. For example, if the researcher selects 500 to 600 
‘ludents from a population of 1,000 students all having distinction marks in any one exammallon 


Paper j Wie a. : 
Per, it will constitute dense sampling. hese are convenient 
these are CONVere” 


“ ing is that | 
se sampll 5 ceeds, these 


The advantage of | ing « 
nia a3 nd den : 
age of saturation sampling a + exceed 1,000. In case it ex 


me ss 3 
“_e of sampling where N of the population does not 
mwOds of sampling become cumbersome and inconvenient. 


386 Texts, i 


DOUBLE sAMeLIN ame implies, !s defined as drawing a sample of individual. 
as its nal 


Double sampiin6 sse the investigator randomly draws : sample of 1,000 from a Do “Oth 
sample of them. TT ai 1000 individuals, he again randomly draws a sample 
= fu = 10,000. fu ! . f 
having N 


further study. This is calle 


Suppose the researc | 
h mail questionl 
in different localitt 


these 400 persons, 


d double sampling. | | 
her is studying the attitude o newly married couples toward 
raire. For this, he mailed 1,000 questionnaires to oat 
es. He finds that only 40% (that is 400) questionna; Usany 
he draws randomly a sample of 100 and mae 
knowledge towards the different techniques of 1 


planning throug 
couples residing 


returned. From A nt 
| tionnaire to get their in-dept h fami, 
Ques , ble sampling. } 
planning. This is called doub’e ** f the research, it is essential th 
° . he an effective part of the researcn, is essential that the 
For double sampling to be al | = Sampling 


hould maximize the representativeness of the indiyj 


from the beginning 5 : angen , 
as the disadvantage ot taking much time and laboy 
rol 


is method h 


plan followed 
drawn subsequently. Th 
the researchers. 


MIXED SAMPLING | | | a 
Mixed sampling is one where the characteristics of both probability sampling and Nonprobabily 


sampling are mixed. Systemalic sampling is one seam itets - 

Systematic sampling May be defined as drawing or selecting every nth person from 3 
redetermined list of elements or individuals. Selecting every 5th roll number in a class of 60 
students will constitute systematic sampling, Likewise, drawing every 8th name froma telephone 
directory is an example of systematic sampling. If we pay attention to the systematic sampling 
plan, it becomes obvious that such a plan possesses certain characteristics of randomness {fini 
element selected is a random one) and at the same time, possesses some nonprobability trais 
such as excluding all persons between every nth element chosen. 

Systematic sampling has some advantages and disadvantages. The important advantages ar 
mentioned below. 

1. Systematic sampling is relatively a quick method of obtaining a sample of elements.| 
the investigator has a short time schedule, this method of sampling eliminates several step 
otherwise taken in different methods of sampling. 

2. Systematic sampling makes it very easy to check whether every nth number or name has 
been selected. In case there occurs any error in counting, that is, if the investigator selects 6th 
number instead of 5th number, his sample will not be seriously affected. 


p 


3. Systematic sampling is very easy to use. In fact, it is much simpler than having (0 ei 
a table of random numbers for drawing the sample or fixed quota from each stratum 
Population in order to have proportional representation. 


bl spite these advantages, there are some [imitations of systematic sampling 4 ind 
OW. 


cated 


| md ice mY: ” 2 ‘ous! 
thes |. Systematic sampling ignores all persons between every nth element chosen. ODM 
itisnota probability sampling plan. | | 
2. In systematic Sampli | i articu a 
pling, the sampling error increases if the list is arranged inaP" 


INCOME, Caste, etc. Ir a = decreases with respect to some trait such as age, ge mor 
individuals a i . such a situation, a bias will be introduced in the sample meer i 
“Yt be drawn from one group 


ed mos! frequently on the group. Such type pling 3p% 


(Blalock 1972). 
epite these limitations, systematic sampling is { 
esp BRS: See 
Bical researches because this method of sam pling 
~olo 
saclom™e 


ynprobability sampling. 
no 


UISITES OF A GOOD SAMPLING METHOD 


method to be good 


resent of bias ma y 


alse resi, 
abetical sult when the 


list is 
“quently used jn 


; Syche = 
POSSeSsa6 the Psy hological ane 


'rait of both Probability 
and 


REQU’ 
follow! ng prope ries. 
|. It must ensure the representativeness of the sample. 
9. It must ensure the adequacy of the sample. 
The first requirement of a sampling method js that 
of the population. What is meant by representatiy 


1 a SCI ities y yO 
af lentifical| soured Must POSSess at least i e WoO 
ee a, il 1 [' 


the selected samples must be 


sentative eness of the , 
represen Cy Chie : e sample? In th 
wikerlinger (1973; 119), “Ordinarily, representative means to be typical of a sepiistonn teen 
VaMHON, (al 15, 


¥ axemplify the « haractersstics of the population. In research a representative sample means 
that the sample has approximately the characteristics of the population relevant to the research in 
question." Truly speaking, an ame tert cannot be certain about the representativeness of a 
sample unless the entire population is tested. But this is ordinarily not feasible. Hence, he must 
satisfy him self otherwise. A sample can be regarded as bei Ng representative of the population 
when it possesses all the relevant characteristics of the population in about the same proportion 
as is in the population. If sex is the relevant characteristic of a population of students of a 
particular university and if the population has a male-female ratio of 60:40, a representative 
sample would be one where the ratio of males and females is also 60:40. For ensuring 
representativeness Of samples it is also essential that the population should be clearly defined and 
a suitable and appropriate definition of the observations, which are to constitute samples, be 
framed. An observation must be defined in the light of the definition of population; otherwise the 
result will be a biased sample. Suppose that the population is defined as all those females who are 
widows. Further, suppose that the observation is defined in terms of selection of those who admit 
that they are widows. In this case, the resulting samples will not be representative ones because 
observation has been detined differently from the population. Obviously, this definition of 
observation will automatically exclude those who refuse to admit that they are widows and 
therefore, the sample will be a biased one. 

The second requirement ot good sampling is that the samples must be adequate. A sample is 
said to be adequate when it is of sufficient size. There is no rigid rule regarding the ii of a 
sample. But a larger sample, in general, is preferred because it tends to reduce the error, which is 
the difference between the population value and the sample value. The mean ofa sample mi 
sie i size of the sample and the error, as defined coal con pe 
ma “ smaller the error and vice versa. The ey es «ize of the sample is increased, 
the Ald : vs error s demonstrated in Figure ia W - 3 close approximation to those of 
Population, _ ee ihe sample i se cane is preferred. Buta very large 
‘AMple may Sa he a isreduced and saber elas ne: -everal problems, which sometimes 
| ot yield that result because it tends to create » 


ber oe 
ome difficult to solve. 


! ; i 
j ty fap pte hyaerdnert Sa Cees 
forth We ' 
jy MWe 


‘ t At Th Le 
ee Gee 

I aint yon 
\ iva ¥ 


ys FO" 


Lani | | 
| 


Small | Large 
Small 


Size of sample 


hip benween the size OF sample and the Wtroe 
oy] (Vs 

* 7 Relate 

Fig, 19: 


say sgund sampling procedure a pee peel ede le: 
The purpose olany i % ; the whole population cf +h; sist eristics eg the 
any behavioural uel js made on the basis of the ¢ aid : ees . is 
—— oe ah upon samples and the conclusion is Re 
the population. Invests” * 


| Mt and Ras 
aN estimate. In 


0 
SAMple draws « 
neralize 


d Ap rom | 
oe - Apart § 
generalization, sampling has some additional advantages, 
i a hens ‘tion ore 
serving this 1un 


"mn 


(and Not 
ONE can o 


+ When one is entrusted with examining sample 

~-iracy is increased, : - ic raduced. Canceriy 

eae “iW : great volume ol the work 15 ene ee ee xi 
Pa ved tthe sample and the results can also be analyzed with greateyet 
supervise the (ask a of trained personnel can be utilized at a lower cost when the umber 
: : : a4, c é i= wis ’ : : 
only this, the ii .< low. The overall results of all this is that the accuracy of the INVEStipatig, 
se 

cases to be process 
in increased. 


 reducect When the data are to be collected from a limited number of CASES rath 
ast is reduced. 7 ee a ae 3 lag 
2. Cost is redu ulation, this naturally entails a reduced cost to the investigator. Not only 
than Hes a scott processing of the data is also reduced. When one js ee wilh only 
this, the time 35 f accidental error in. statistica calculatio 
imi ; he chances of accidental errc Kee 
limited number of data, t 

inimised, | _ 
also m Speed is increased: When the data are collected from a sample rather than from the 
senmvene, ie investigator can do his work speedily without deing any injustice to the 
t | ; sti 


investigation. This is one of the vital considerations, specially when the information about the 
sample is needed urgently. 


SAMPLING DISTRIBUTION 


In any sampling situation, three types of distributions are of common interest to the bie 
investigator: the sample distribution, the population distribution, and the sampling distr re 
The meaning of and distinction between these three types of distributions may be explat 
through an illustration, 


| te ie AN . jurthe 
Suppose in a small town, the total number of graduate parents is 30,000. Suppose ’ o 
that we are interested in knowing the mean fertility rate among these parents. Owing to tim 


= cide 0 
money considerations, we cannot examine the whole population and therefore, we de 
examine only a random sample of 300 


| ed in 
parents. The observed data may be arrange’ 
frequency distribution and the 


- treet tion 
desired statistics may be calculated. This type ot pang 
known as the sample distribution because it shows the distribution of data obtained Ir 
sample drawn randomly from the population, Now, 
institution we are able to record the iertil 
distribution, this distri 


: + yizalion 
if with the help of any social orga 
bution 


ity rate of 30,000 parents and arrange be 
stermed as the population distribution. In reality, the form 


ola poputation ares Tat Srna know, in estimat 
wars | a e | re ; ; ; rw "aT Ale 
‘ al the population ' bai On My Hasis of the Horm ane 
+ Af the torn OF sample distribution AOL, we EXPCCL that 
y ‘jistribution will also be nornal. ithe TACAT AT the ey me mL that the 
un yo i i i‘ ma i faait 
pul : ao erduate parents are SB9 and | ay FOSDOCLIVE |W. tie sac; if the lertility 
pe ye (il 4 | lis i\ ¥, a be LITT as halt the mean n \ th 
wee | i Hon al the JPOP TOR Gstn JUTIOT ope alin hg and | 02 res AD ANE i 
ji acl iM las ola piven sive lin this bxample Og. ' a att lively, if We Take 
ple sarype ‘ ! 


triple ye ied an, ) 
age Pies Consisting of 400 parents each 
Npo 


Semple 589 
Meretare 


AHOUT These tye) 
PAlimetors, gf 


the sample 


| lorm of the 
are deviation 


i 
4 


wm we will pel dillerent Sarnpale: distributions. Fror Waste ion we 
ke ‘ , hy : 
(' jal 


— is, 100 different meanc ua will get 109 differery 
ae str DUTOn: that ts, TOO different my ans and 100 differen Standard deviations, if ye 
sm 400 means and the 100 standard deviations Into frecue | Mowe 
: he » sampling distribution of mean 
7 a gel INE 
wil 


‘ICY distributions « par. 

é i . ‘Parately, we 
. ! and the sampling distribution of standa i 
he samplings distribution is a theore 

otra the me 

phuss 


at ro ere rd deviation, 

| ee Ni al or idealized distribution, which is obtained if all 
| ymples of the piven size are Laken from the Population, Thus | 

eum of statistics obtained by selecting all th 

ge PLT . 

jstrib 


: 4Sampling distribution isa 
& possible ¢ 
ulation. 


amples of a specific size 
pop 


SAMPLING ERROF 


readet must have eathered by now, sampling refers to the 

a = : 

nb ie of cases from a population. Whatever the 
Pf ab oS . 

num 


from a 


Process of drawing a limited 
| | ie sampling procedure used, the sample must 
nt, as far as possible, all characteristics of the Population t 
repre. 


hat are related to the problem 

ider stucly. | 

sas Astatistical measure calculated froma sample is known as 
a ala be : ef 

sasure which is usually unknown and is based directly upon the population, is known as a 
amare A statistic is frequently referred to as an obtained measure or ‘derived measure’. A 
jarametel. ; SP ELeer Oe a ese: ; 
rail calculated from a sample of a population is called ‘statistic mean’ and the value of mean 
based directly upon the population is known as Patameter mean’. A sampling error is the 
difference between the parameter and the statistic. As Lindquist (1968, 8) has put it, “A sampling 
error is the difference between a parameter and an estimate of that parameter which is derived 
from asample.” 


a Statistic and the corresponding 


Asampling problem is said to exist whenever the conclusions obtained on the basis of a 
limited number of indivicuals (called the sample) are 


applied to a larger number of individuals 
(called the population). Such generalized conclusions are known as statistical inferences. The 
higher the sampling error, the poorer the statistic 


al inference. The term ‘sampling error’ does not 
suggest any mistake in the process of sampling itself, rather it merely suggests the chance 


variations which are inevitable when a number of randomly selected samples are taken from the 
same population. Due to this sampling. error, making an estimate about the population 
characteristics from the sample characteristics no longer remains an exact process. It has been 
observed that a sample statistic does not ordinarily agree with that obtained from the second, 


third or fourth sample or with the population parameter. The amount of fluctuation (or the sample 
error) depends upon the following two factors: 


li) Variability in the population 
lii) Size of the sample 


If the nature of the population is such that it is not stable and tends to vary, natural i the 
SUCCeSsive samples taken fram the same population over time will tend to differ. Generally, ( 
fandom sample provides a good approximation to the population and the -_, . 
4pproximation depends upon the size of the sample. As the size of the sample is i . 
lorm and the statistics of the sample distribution come nearer to those ds : r i 
fxample, when the size of the sample is 100, the distribution as represented by the freq 


Lar pai fhe ule cts 
‘ juin iat Agreed 
hog Wd dy MMe fini 
wes cand Aes 


Ray eins. Afoosnarents 
i very clear idea about the for, 


Ys ih to give 

polygon Is usually smooth noupt! 

population dd 
There are € 


hence, the sampling ¢ les 
wiation tuctuate 


these two dre MOFe re : 
esponding population | 
oduce the least sampling error, 


ons, there are three types ol important distributions Which ap 

a clear-cut idea about the application of the sampling a : 

ation distribution and the Sampling distrbunge 
example. Suppose, we wish to know the mie 4 
of 20-45 years) in the Tata Iron and Stee! acti 

nage range ts 30,000, Owing oth 


istrilrulion, 
“ertann whatistics 
“pra Us less, FOr 


rend to ic tual loss than others in sues eCSSive samples 


hich 
in sampling froma normal distribute 


example, mM he 
; mM 
Wl ahve Mendsuires onl « entral lonicleny y nicl Meg _ 
“* SUPOe 


diable than other similar measures be, ii 
AUSE the 
rameters and hence, tend (5 fu His 


less th. 
and the standard ce 


of variability. Hence, 
e closely the con 


approximate MOF | 
y sample and thus pt 


less trom sample lk 


In most sampling situall 
ader inorder to have 


They are: the sample distribution, the popul 


These three terms may be explained with an 
of workers (in the age nge 


Further suppose that the total number ol workers in this give 
eo may decide to take a random sample of 300 ane administer 
a 


and money considerations, 
standardized test of intelligence. The obtained SCOres on the test by 300 workers oxpressed in 
frequency distribution will be called the sample distribution. Thus, the sample distribution js an 
observed distribution which can be described by any descriptive: statistics. Hence, sample 
distribution is the concrete distribution i any sampling Situation, 

est ot intelligence on the whole population, that is, 


concern to the re 


intelligence score 


However, if we decide to administer the | 
130,000 workers, we shall be getting a setol 10,000 scores. Ihwe convert this set into frequency 


iW 

distribution, this will be known as the population distribution, Ordinarily, the form and the 
parameters of the population distribution are not Known, However, we can infer them on the 
basis of the sample distribution, If the sample distribution ts normal, ican be interred that the 
population distribution is also normal, ascribing any irregularities in the sample distribution te 
the sampling fluctuations. 

The inferences drawn regarding population distribution on the basis at the sample 
distribution may not be correct due to sampling errors. In the above example, we do not have the 
fixed mean and the fixed standard deviation which would be obtained again in the second or 
third sample of 300 trom the population, It we were to take. say, 100 samples of 300 each, we 
would be getting 100 different means, 100 varying standard deviations, It all the 100 means and 
100 standard deviations are grouped in frequency distribution separately, we would have an 
exact sampling distribution of mean and the standard deviation in the population. As this is not 
he omer : ee : idealized dist ibution which may be thought of as 
possible numbers of a ie a —— values of a given statistic were computed ina 

Sisilidetiee i a ple of given size actually taken from the population. 
distribution Of a statistic is known, it i ) era ine dasillad population, When ne men 
irequency with which different “aia cai , ete TS oae to:.determine the bai : 
Population in which the statistics h : voir tend to occur in random sampling irom 
the relative frequ istics have some of them assumed or hypothesized. After knowl" 
ah eee duency, the investigator is able to know RC hee AB eae 
value has resulted from the « he € to know the probability that a particular samp 

sampling fluctuations or chance. Ultimately, by knowing 


et rigure, he will be able to judge the soundness of | Wee 
Statistics ina bopulation, H& he soundness of a hypothesis regarding the ¥ 


Te 


at li 





What do you mean by sampling? Discuss ‘ Ca) Review Questions 
i LTS i It" | 


to simple actors hear 


elec ee the decision 


What is sampling distribution? Discuss the riethee 

iu ; . lethiods af drryy 

Citing some cxamples, discuss the relatjy of drawin 
* relative 


probability sampling methocts 


anh, Ain i : ‘ i | . 

: Lf La | ‘ i ry Wit) 

ttl rvs sil wees Se | les, discuss the: rehitiv 
4 i i" 


ili clWwantages « 
nonprobability sampling methods Intages and dis 


HVA Nii pes ol ny [Wes 

What is ‘systemic sampling’? _ ; 

ek Se ; 2 ania Ny ling’ How does jt differ from ‘Chuster »: . 

rehitive advantages cn disadvantages of systematic “er sampling’? Discuss the 
oy HOC Sampling. ; 


Explain the importance of s: ee 

| nee of sampling in research. Diserse ny two method 
What ure the requisites of Rey ae canst. Methods of sity 

mat i JMIsiles OF a gece sample? Stute the advantage Np ling, 
scientific research, HOVANEI RES Of randomization in 
What is the greatest danger in using accidental sample in pe | 

Then isc “fee : me sere hy? 
When is cluster sampling likely to be preferred most? 
Whi wotulel ‘l researely ip aby urs ly keel a gs its 

er Choose strititiee! mindom Sampling instead of simpl 
ene HTTP TPN 


sampling? 


eb 


15 
SOCIAL SCIENTIFIC RESEARCH 


—— 


CHAPTER PREVIEW 


/ = = a ! : s Re Sed ch 
o k 


e sciennhe Approach ame 
The Assumption of 9c 


to the Study of Behaviour 


Artitudes of Screntisis — 
The Flaws in Scientific Researe a 
-snerimental Valicity 
Validity in Research or Experime tal Walichity — 
— : ility < ‘alidity in Resear 
: “sntrolling Threats to Reliability and \ aliclity 
e (aon 4 
e Phases of Stages in Research 
identifving the Problem . 
ating a Hypothesis - | 
Fonnulating a A : ; ee 
identifying, Manipulating and Controlling \ iriab 
he ‘ a bo | FE : 
»arch Design 
ormulating a Rese: _ en 
c tructing Devices for Observation and Measurement 
(onst TMs LL: = 
Summarizing Results . 
Carrving out Statistical Analysis 
Drawing Conclusions 


Types of Educational Research | 

e Types of Research: Experimental and Nonexperimental 
Laboratory Expenments 

Field Expenments 

Field Studies 

Difference between Field Experiment and Field Study 
Fx Post Facto Research 

Survey Research 

Document or Content Analysis 

The Case Study 

Ethnic eraphic Studies 


¢ Difference between Research Method and Research Methodok Dery 

* Ethical Problems in Research 

® Companson between Experimental Research and Nonexperimental Research 
© Types of Experiment 

t 


Types of Applied Research 


MEANING AND CH 


The word research co 
PFetix ‘re’ is apain of 


ARACTERISTICS OF SCIENTIFIC RESEARCH er 
NSists Of two syllables— ‘re’ and ‘search’, The dictionary ise to 
Over again and the dictionary meaning of the word search =e v careluh 
© probe. Together they form the noun research describing 4 ¢ 

392 


EXAMINE Carefiy| ly or 





ee 





‘it teal ket of ef te he ee ea fh 404 


gematic investigalion in some field ey kniey 
199 }). Therefore, any scientific research (: 
. certain questions or problems, The PUTPOSE of 
organized hody of knowledge. Therefore: velop a 
and empirical analysis and recording race enna 
development o} theories, SOnceDIS, Bene and principles, resulting in sibehes : Wa 
control of those activities that may have SOME CaUse-effect pe| zi hae ‘ ie on anc 
been piven by Kerlinger (1973, 1 1): “Scientific research is ar definition has 
critical investigation of hypothetical 


= Controlled, ern pirical ane 
Propositions Presumed relations nat n bs 
ty == s ™ elt Tt) The ill 
phenomena.” In order to elaborate the above of the characte rist| i = on 
oe STISES Of systematic 

-h are presented below. 


Wleelpier fer goctaat 


ishing facts or prin 
re S¥SlOrmatic ane ¢ 


IHjertive atte 
of lerititye Pere h \" 


» SCAT research rn, 
rf Controlled ote 
ralizations 


Ciples ‘Corinne | 
MPT to provide 


CLS Wier, 
ted: 


yOOver and cle velop an 
1 he defined ac the 


Which may le 
altonshiy 


4a systematic 
about the 


Views, SONG 


|. Research is always directed towards the 
research the researcher always tries to answe 


solution of a problem. In other 


Fa question orto relate two or more varia 


words, in 
bles under 
2. Research is always based upon empirical or observable evidence. The researche 
those principles or revelations, which are subje 


: frejects 
ctlve and accepts 
principles, which can be objectively observed. 


only those revelations, or 
3. Research involves precise observation 
reliable and valid instruments to be used 
measures for accurate description of the results obtained. 

4. Research gives emphasis to the development of 
which are very helpful in accurate prediction regarding t 
the sample observed and studied, the researcher 
the whole population. Thus, research goes beyon 


investigated by formulating a generalization or th 
5. 


and accurate description. The researche 


, r selects 
in the collection of data 


dnd uses some Statistical 


theories, principles and generalizations. 
he variables under study. On the basis of 
tries to make sound Beneralizations regarding 


d immediate situations, objects or groups being 
eory about these factors. 


methods employed, data collected and conclusion reached. He frames an objective and 
scientific design for the smooth conduct of his 


research. He also makes a logical examination of 
the procedures employed in conducting his research work so that he may be able to check the 
validity of the conclusions drawn, 
6. Research is marked by patience, courage and unhurried 
researcher is confronted with 


difficult questions, he must not 


activities, Whenever the 
have patience and courage to think ov 


answer them hurriedly. He must 
er the problem and find out the correct solution. 
7. Research requires that the researcher h 

must know all the relevant facts regarding the problen 
associated with the problem. He must also be 
analyzing the obtained data. 

8. Research is replicable. The designs, procedures and results of scientific research should 
be replicable so that any person other than the researcher may assess their validity, Thus, one 
fesearcher may use or transmit the results obtained by another researcher, Thus, the procedures 
and results of the research are replicable as well as transmittable. 

9. Finally, research requires skill of writing and reproducing the report, The researcher 
Must know how to write the report of his research, He must write the problem in pdr 
terms; he must detine complex terminology, it any; he must formulate a clear-cut ane an 
Procedures {or conducting research; he must present the tabulation of the result in an objective 
Manner and also present the summary and conclusion with scholarly caution, 


as full expertise of the problem being studied. He 


Vand must review the important literature 
aware of sophisticated statistical methods of 


git la 
bo M vhigyis oh Behariownal SC MCE 
regi eh ue * 
+ ‘ al btext 
» Jest Mims nds 4 
404 Cat, « 


IDY OF BEHAVIOUR 
iFIC APPROACH TO THE attics od nomeciersiin Vacs, 
pes difference between > 


ie | re, SCIENtIStS Use 
abies Sie it nature. Screntist . 
As we know, the maj sale aby 


+ the questions they 
go about answertnh ' 
LF 
scientific method 4 ature. — 
sstions abou ics, chemistry or biology because they 
answering Ques ience like physics, CHEMIN hy towards the entire. 
t hiecoeniie | sas in philosophy towards the entire 
Psychology ists employ a certain philosophy tov 
} ee tists emp i : ee =ntists tr to conce 
-anttic method. Scien yeaa 1e way scientis y 
aaa ea fact, this philosophy starts with t 
re. UP dah, = 
about natu 


Way ec: 

Bcc NY ENKI gy 

“ION tic Methog ats 

ih 

. «s, BOals and procedhtyr¢ a 

{ some assumptions, attitudes, gt RTOCERUEES for « FEating 

‘jets Of SO - . 
consists ¢ 


all me | G h 
ae hit @ 
1S5uUe ot learning 


Plualize Nature 


The Assumption of cone tly human behaviour, seems not only Complex but also MY Steri- 
gue », especially human benav' ing, science makes carta; tetious, 
Any aspect ol als As effort to understand it. In so doing, Scie nce Makes Certain ASSUMption. 
, a .e : 1SISle ay | AG 7 
Despite this, we eH ips us to study it as a regulated and consistent system, The Se ASSUMPtion. 
7elps us [oO : ‘ 
about nature that ; 
ined as under: waaay eee : 
may be explai «< lawtul. The basic assumption underlying the scientific Method js that Nature 
fal WW ‘ oT : . 1 sa Wee eee ; =—t5 
1. Nature 1s lawful manner. This assumption is made because jf Nature is yoy lawfy| 
: te ind F oN re eee Tee r . ‘ 
tends to — it can never be understood fully. As physicists discuss how the law of Bravity 
ther is random a 4 a sel aieg siti 
rather is a 2am of ‘planets psychologists also assume that there are laws of Nature that 
5 {he be : pars a Baa aed bx; sis . ' 
govern behaviour of living organisms. Behaviours are regulated by natural laws and therefore, 
govern the be be explained in terms of specific Causes and effects. For example When 
i TU LHe t 7 : ; F 
any Delavan ee lopment of language in humans, they are, in fact, studyin 
psychologists are studying the developr BU aR 3 ying the 
laws of nature. 


Some psychologists, however, would argue otherwise, Sails NB that In fact, itis too much to 
say that a research study directly examines a law of nathire beca oe: ol a harrow and limited focus 
of each study. Apart from this, most researchers are requently confronted with Contradicto 
findings and opposing explanations. However, we can assume eet eventually all of the diverse 
findings will be integrated and coordinated so that psychologists can truly understand the laws 
of nature. 


2. Behaviour is deterministic; Psychologists assume that the beh aviour of the organism is 
determined. According to the doctrine of determinism, behaviour is solely influenced by natural 
causes. Determinism, which is distinct from predestination, suggests that while there is no overall 
plan, there are predictable, identifiable, and natural causes for every behaviour. Thus behaviour 
of the living organism does not result from ‘free will’ or choice. Thus the science of 
clearly states that persons cannot freely choose to exhibit a particular 
particular way in a given situation, The laws of beh 
attributes and to behave in a cert 


situation will be simil 


psychology 
Personality or respond ina 
aviour force persons to exhibit certain 
ain Way ina certain situation. Any two p 
arly influenced because that is how the | 
to some individual differences, no two individuals will beh 

3. Nature js understandable: 
that it can be understood. We 


assumed that we will eventually be able to y 
Implications. First, any statement cannot be 
question and to ask for proof. Second, any scientific statement must logically and rationally fit 
within the known facts so that it can Maximally be understood. . | 
an be said that any behaviour, in order 
ined and understandable, 


ersons in the same 
aws of behaviour operate, But due 
ave identically. 

Although nature is Mysterious and complicated, we assume 
may have not yet tully understood some aspects of nature, but itis 
nderstand it. This assumption has two important 
taken on faith. In science, it is always appropriate to 


Conclusively itc a be 
| et » be studied scientifically, must’ 
assumed to be lawful to be studied scie V 


nes wee 





Sew teal Sclemtific Research 395 


Attitudes of Scientists 


-antists tend to differ from nonscientists In thei 
im aff athe 7 1 A 1 
Sc oh certain types of attitudes that puide their 
em ; 


1. Scientists are open-minded: Scientists adopt an 
roach, They must leave their biases and pbreconce 
nad censibilities or contradict their beliefs, but simply due 
elf ser ee ee 2 
4 smiss it. They look in different directions, at all possible 
Cus } 
3 behaviour. 


Fattitudes towards understandin 
activities ina certain Way, 

3 attitude of open-mindedness in their 
Ptions behind. Any explanation may offend 
to such Contradictions, they don't 


aspects, when trying to understand 


& Nature, They 


2. Scientists are uncertain: Scientists recognize this b 
degree of uncertainty. At any one point they m ay feel that t 7 Wide sore 
yodesstanicing ) Caniie> Out anon _ taely knows everything about how nalure operates. For 
psychologists, this obviously means that no one knows precisely what a behaviour entails, no 
one knows tne ied aa and COFFEE Waly ttl study a behaviour, and no one knows all of the 
factors that intluence behaviour. 


3. Scientists are fallible: Scientists dey 
mistakes because we don't know how nature 
it will be obvious that the description th 
or inaccurate. Therefore, 


asic principle that there is always a 


egree of 


elop an attitude that everyone is fallible. We make 
works. If we pay attention to the history of science, 
at at first appeared accurate later turned out to be wrong 
scientists always question whether the factors Proposed as important 
might actually be irrelevant, and whether the factors proposed as irrelevant might actually be 
important. 

4. Scientists are cautious: Scientists are v 
findings. They never treat the results of 
treat a research as finding merely a pi 
confidence in the statement about 
terms ‘proof’ or ‘proved’. 


ery Cautious when d 
any single study as a fact in the us 
ece of evidence th 
nature. For this re 


ealing with scientific 
ual sense. Instead, they 
at provides them with some degree of 
ason, the researcher should never use the 
9. scientists are skeptical: Scientists are very skeptical about 
scientific statement. This is because of the fact that no one knows how 
we are fallible, any description of nature May prove to be incorrect. Scientists critically evaluate 
the evidence using various types of logic and their knowledge of Psychological researches in 
order to know whether the evidence for any statement is correct or conversely, is fraught with 
several kinds of problems and contradictions. There js also another reason for scientists to be 
skeptical: Scientists sometimes do traudulent acts, that is, they may publish research lindings 
when no research was actually conducted or sometimes inaccurately report their findings. 
Often in such a « 


ituation, they simply (ry to respond to professional pressures to be a 


accepting the truth of any 
nature Works, and because 


productive researcher, 


6, Scientists are ethical: 
concermed about balancing 
harm to others. Ff irst. 


scientists adopt an 
the goals and desires of the 
researchers in 
research in such a way that may not 
responsibility of not cc 


keeping such a result 


ethical attitude as they are very much 
researcher with responsibility to cause no 
psychology have an ethical responsibility to conduct a 
Cause harm to others. Second, researchers have also ethical 


onducting any type of fraudulant act, which includes falsifying results and 
secret that contradicts one's views. 


In this way, it can be said that scientists have S aidlitc aEsall icc aichee related to tt 
including conduct of human behaviour. They recognize fallibility of all resea 


2 ; ling with subjects or 
Therefore, they are open-minded, skeptical, cautious and ethical, too, in dealing 
Participants of the study, , 


adopted certain attitudes towards nature 


The Flaws in Scientific Research 


‘ea thie these rules and 
- tive criteria. Despite this, the 
Any scientific research employs many rules and ee roy hain he research. As we know, 
criteria don’t Zuarantee to eliminate bias and error completely Ir¢ 





slay anal Son is 
MI emowtts anid reacanh Methods in Rebar 
496 Tests, 4 east rn | 
here are many ways to design a study, and human 
‘ ry a L | | 4 , 
egies = nities jor researe hers to make errers In either 
ue -geientiti research bast ally relate to flaws in 


hits Intenpretalion. We shall consider those 


there are many Ways to design ee w opp 
nature is complex, there are also a il ys I 
epllecting data or interpreting it, here : ace pe 
ccientific evidence and Haws 1 te es Qarngs HUY 


two flaws separately. 


bcctaieecamle oti -sarch, observations should be completely empiric al, objective 
ideally in any serentitn iiuaet 6 * the researcher may be accurate 1n observation and 
systematic and controlled, so that te ar the reality is that no study is perfect and ideal, In 
ste less objective, more or less systematic and 
asily determine the depree le which 


measurement of the exact behaviour, Howev 
fact. any study may be more or less empirical, more or ts 
more oF less ontrolled. Basically, there ore four factors tr 
a study meets the criteria Tot ecrentiti evicely ie. «come implicit behaviours which cannot be 
Derg sles see wis roe, cermin and controlled way. For example, 
: + curl letely empirical, OD]e ee Pe Seiten itn ‘ ; 
parr i aa male cusatade We cannot observe fe d tn lo 
observe some other behaviours and then draw inferences about | see! bh ‘oe x i © extent to 
which we are able to obtain precise and objective measurements af PTL 4 CEP ire ela the 
behaviour being studied. For example, for objectively assessing ant hee € See like 
aggressiveness or love, there Is no yardstick and there Is no single ahah Healt which 
inferences can be drawn, For assessing these behaviours, the researc her . are bound lo employ 
some less concrete and more subjective measurement procedures which may include some 
biases or errors, too. Likewise, a researcher cannot observe al| aspec ts to all behaviours ina 
systematic and controlled way. For example, suppose he is studying attitudes of females towards 
child rearing. Here it is difficult for the researcher to separate the fact that each subject has 
female's personality, trom the fact that each subject has also a female's genes and physiology. In 
such situation, the researcher cannot be sure whether itis a female's personality or her physiology 
(or both) which are influencing her attitudes. 
in view of the above limitations, flaws creep into the scientific evide 
we cannot know wh 
truly operating. 


FJ 


as 


nees and then, 
al our measurements reflect about the behaviour or What factors are 


Technical limitations: Sometimes there is a lack of necessary technical capabilities and 
thus the researcher is restricted in observing and measuring some behaviour or factor. In sucha 
situation, the research may yield very limited and potentially misleading information. For 
example, in the late 1800s, ‘phrenology’ was included in psychology. As we know, ph renology 
basically denotes that various personality traits are reflected by the size of the parts of the brain 
and thus, by bumps on certain parts of the skull. Considering today’s technology for studying 
brain physiology and personality, the study of phrenology seems silly. : 


init ta Maen ics ode ane uias lasses omccrdteoe nee eis 
the research design specifies the dcdiie ar oa 265 snd Puiuetents eae —_ gece 
: h » the 3 ere the study is to be conducted (that is, whether 
aaery or field), the types of subjects we will study, the way we will measure a behaviour and 
& zg! iis sap! pedicel Pu ‘ate lactors. All these specifications of the research design, NO 
pe one hand, increases objectivities to some extent but also tends to restrict and reduce 
aur Confidence in the findings. 
ecniiice Cee perspective of a study; The evidence obt 
convincing bu a 
words, due to limited perspective any ie wie sah sensi Nancen OPN OH IETE: poe 
the researcher takes one approac h whicl ibid hot represent the nature because in one study 
Nervautomatically means that he is excluding others. In fact 


ained from a particular study may be 


Sectal Mehenlifie Kesearch 497 


als uation is (luestrated hy an old fable whic h tells of veveral blind men trying to describe an 
Jephant I stuiclyinpy a behaviour, the researchers are like the blind men, Irying to deseribe the 
wire elephant hy taking only one limited pespective ala tine, 
entire &™ 


These are hie major ilaws if oe imrtitve evyviclere i. Fyer when the evidence: from rs | study 


reflects a minimum of flaws, we still cannot be sure about its findings and conclusion because the 


interpretation of the resulls is also to be taken into consideration. Inappropriable interpretation of 
resting the hypotheses may introduce several flaws. 


rlaws in testing hypotheses 


Flaws in lesting hypothesis are also important sources of tlaws in screntitic research, Otten, It is 
found that while testing a psychological hypothesis, some data appear to confirm a hypothesis 
while other data appear to discontirm it. Therefore, it is essential that the researcher must 
critically evaluate and weigh the various types of evidence and research that produced it. Let us 
illustrate the whole sequence through an example. Suppose the hypothesis being tested is: 
Dreams predict future, If the person accepts it, it is probably because he has oce astonally 
dreamed of events that ultimately occurred. But let us consider the quality and quantity of 
evidence that supports this hypothesis. There are several points to be considered. First, this 
hypothesis is a suspect because it is neither rational nor parsimonious. Second. the recall and 
interpretation of the dream and the feeling about how they match up with the real events actually 
lack objectivity. Third, here the researcher is relying only upon confirmation, His observation ts 
consistent with the hypothesis that dreams tend to predict the future. But this weak evidence 
should be weighed against the amount of discontirmation that is available, the number of times 
the dreams failed to come true. In sum, it can be said that there are many evidences which 
disconfirms the hypothesis that dreams predict the future, Therefore, it could be said that those 
few confirming instances are nothing more than coincidence, 


Thus it is obvious that our confidence in any scientific hypothesis is based on the quantity 
and quality of evidence that confirms and disconfirms it. Accepting a hypothesis merely on the 
basis of confirmation and ignoring all those evidences that provide disconfirmation introduces 
important flaws in scientific research. 

Finally, we can conclude that flaws in scientific evidence and in testing hypothesis and its 
interpretection provide a big challenge to modern researchers. 


VALIDITY IN RESEARCH OR EXPERIMENTAL VALIDITY 


Every researcher attempts to achieve maximum validity in his research work. Ordinarily, he has 
two basic aims in his research work. First, the researcher wishes to see the extent to which the 
factors or variables manipulted in the study have a systematic effect upon the desired variable in 
an experimental setting and the extent to which the desired variable js not altected by 
uncontrolled or extraneous factors. This aim of the researcher is called internal validity of the 
research. Internal validity is when the independent variable alone aifects the dependent variable. 
Thus internal validity indicates the extent to which an experiment is methodologically sound and 
ree trom confounding. Researches in which the ettect obtained cannot be attributed with 
confidence to the independent variables are called internally invalid research. In contrast, the 
research has internal validity when the effect can be contidently attributed to the manipulation of 
the independent variable. Tuckman (1978, 4) has said, “A study has internal validity if the 
outcome of the study is a function of the programme or approach being tested rather than the 
result of other cases not systematically dealt with in the study.” Besides the above aim, the 
researcher has another aim, which is primarily concerned with the generalizations 0! the 
research. He wishes to generalize the conclusion reached on the basis of his research in relation 

LO similar situations or programmes in nonexperimental settings. The degree fo which this goal Is 
achieved reflects the external validity of the research. Again, to quote Tuckman (1978, 4), A 
Study has external walicity if the result obtained would apply to the real world, to other similar 








98 oC af ‘T beet yral Sciences 
Vi wens and Resear p Me Proud in Bepeat ma 
7 Tests, sre St - 
validity, internal validity is of ) 
ent has little or no | gh Va u 


programmes and approaches. 
sidera 


because if a research or experime co 
generalized or can be generalized only 


automatically be affected. —" 
There are several factors that tend to In 
behavioural experiment many 
ent variable. Extrane 
or unmanipu 
es, Although these 


ble risk and hence, its external validity will 


internal and the external validity of 
extraneous variables are present, which are likely to 
ous Variables (also known as releva nt variables) are 
lated variables that may produce a significant effect 
extraneous variables ¢a nnot he altogether 
eliminated from the experimental setting, they can be Fite Ge : ie ellen: Upon 
the results of the experimen’ ©a0 be minimized. a a tect the intern al al nee Si 
number of extraneous variables oF factors that are ike y _ ani borat ee i j NY andthe 
external validity. Within the category of internal validity are aM 3 ctors. Some of 
ther are mentioned below. 

1. Maturation: Maturational changes 
the manipulation of independent variables. 


research. In 
influence the depend 
those uncontrolled variables 

upon the dependent variabl 


are likely to be confused with changes produced by 
This Is particularly true in the case of a long-term 


experiment. When the experiment continues for a longer period of lime, the subjects may 

become wiser due to incidental learnings occurring during the period of exprimentation. In 

such a case, a change |s likely to occur which may provide a threat to the internal validity of 
7 , ! | 


the experiment. . 
9. Unsound instrumentation: Normally, the researc her uses some Instruments or tests for 


assessing different aspects of behaviour. If those tests are less reliable and less valid, errors ate 
likely to be introduced in the measurement, which may ultimately tend to reduce the internal 
validity of the experiment. if human observers are used in place of tests or instruments, errors 
relating to human observations such as errors due to fatigue, increased experiences, and changes 
in mood of the observers are likely to affect the internal validity of the experiment. 

3. Immediate external situations: Immediate external situations are likely to affect the 
internal validity of the research. The subject may be worried due to failure in a university 
examination; he may not have slept the previous night due to fatigue from a late night party; orhe 
may be too shocked after hearing the news of the death of his mother or father, and so on. All 
these external events are beyond the control of the experimenter or the researcher and tend to 
reduce the subjects’ consistency in behaviour and thus the internal validity of the research. 


4. Testing: Testing means the effects of pretesting upon the performance of the subjects 
during the post-testing. This process of testing sometimes also tends to induce a threat to the 
internal validity of the research. When the subjects are tested in the beginning of the experiment 
two consequences are likely to be the most obvious ones on the post-testing performance. First 
the subjects may be very conscious of the latent purpose of the test. This, in turn, | likely to 
produce some changes within the subject. Second, the pretesling may make the subjects more 
skilled and proficient on subsequent testing or post-testing. This, aga in, is likely to produce some 
changes in the subjects. Both these types of changes are likely to affect the internal validity of an 
experiment adversely, | 
Be caine cath ai 10 affect the internal validity results ae = 
Due to one reason thats whe 7 : da as exponen a onc group 
loud cageece A 26h re control group is not equated to the experiments a 
differences are likly arte a ‘ motivation, interest. and even in ability. A | this 
sides akealontime- flac ae validity adversely. Probably the best way to contro 
ype ection bias is to make use of the random assignme also called i domization) } 
subjects to the experimental group and th ie — 

€ control group, 


when the 


vO) | BS ot a ® : . 
» Of these two lyP nternal validity, its conclusion CaNNot be 


Social Scientific Research 399 


6. Exp erimental mortality: Experimental mortality means the loss of subj 
eriod of experimentation. Experimental mortality is often tic iad asad yh the 
experiments. Generally, it has been reported that those subjects ae yea case of long-term 
sriod at experimentation are more motivated ain during the entire 
re either drop out in the middle or forcefull ji a st more attentive than those 
affect in the sense that it creates an imbalance sions the si ms is a confounding 
had been initially selected and ultimately poses a threat to the vitae val it erence? 
ri Statistical reg een The phenomenon of statistical regression is often seen in 
pretest-post-lest Sea GTS and where the groups are selected on the basis of extreme scores 
Subjects who a © higher on pretest (that i, selected on the basis of high scores) tend to scott 
lower on post-test mascot and the subjects who score lower on pretest, tend to score higher 
on post-test ac eldalidah Normally, this regression takes place towards the mean of the 
distribution. In other words, the scores of the high scorers will tend to decrease towards mean 
while the scores of the low scorers will tend to increase towards the mean on post-test 
pearance Thus the two groups differ on the post-test scores even without any treatment 
given to them. Probably, the reason is that chance factors affect the extreme scores more than 
average scores. When a researcher fails to recognize this, a threat to the internal validity 
js apparent. 

8. History: The term ‘history’ as used in research refers to those events of the environment 
which occur at the time the experimental variable is being tested. Suppose tomorrow a group of 
students (all Indians) is to be administered a test measuring prejudice towards the Chinese. The 
same night China invades India. Such external events are likely to affect the scores on the test of 
prejudice. Probably, the subjects will show a strong prejudice against the Chinese. In such a 
situation, the measured outcome on the test may not reflect the true prejudice. 

9. Diffusion of treatment: Diffusion of treatment also known as contamination is such a 
threat that research participants in different groups are likely to communicate with each other 
and learn about the treatments of others. Experimenters try to avoid it by isolating groups or let 
anything to others who will act as participants. For 
example, suppose an experiment is being conducted in which the participants are kept engaged 
for the whole day on a new way to memorize the task. During break, the participants of the 
al group tell those participants in the control group about the new way to memorize 
and that technique may be used by the participants of the control group. Such diffusion of 
treatment to the contral group is likely to provide threat to the internal validity of research. 
Sometimes experimenters may provide something of value 
fferences become known. This inequality 
to reduce differences and rivalry between groups, or sometimes 
se behaviours, called compensatory behaviour, 
he treatment. Such effect naturally becomes 


the participants promise not to reveal 
experiment 


10. Compensatory behaviour: 
to one group of participants but not to another. Such di 
tends to produce pressure 
resentful demoralization is also produced. The 


can affect the dependent variable in addition to t ) 
confounding. For example, suppose one school system receives a longer lunch break (treatment) 


for producing gains in learning whereas another doesn’t and rather continues on doing hard work 
for learning. Such system of the same school may be demoralizing and the participants Or 


students may withdraw from learning. 

11. Experimenter’s expectation: 
considered a threat to the internal validit 
internal validity, An experimenter may t 
experimental expectancy to the participan 


Although experimenter 's expectation 1s nol always 
y, sometimes the experimenter's behaviour can threaten 
threaten internal validity by indirectly communicating 
ts. The experimenter may be highly — i 
problems and hypotheses of the study and indirectly eas 8 —— ng o 
Participants. For example, suppose the experimenter has Los es a Sipe 
Participants will be more sensitive towards disability than male f pants 


wal bay Fee a 
Soafage A MM 
a athnxds tn Bobat 
400 Tests, Measurements and Research Methos 


arbal communication, the experimenter May 
psa aA ayo onemgrina ma engineer positive feelings ani ha With 
consciously encourage female participants ‘ethreat to the internal validity, Ag 

disabilities. Thus, the Seen: expectancy ee a experimenter's expectancy 
know, the double-blind experiment is apnducies vel s those in contact with the Participants 
double blind experiment both the participants as : i . a nslerneriivent 
(experimenter’s assistant) are ina sense, yin to niece or exiliient i thea ined ™ 

ike internal validity, the external validity of Me Tester” 4 facters WHICH Pose’ thu 
venient deine variables or factors. In general nen there are other additional 
internal validity, also pose a threat to external validity. Seneca the obtained results to a wig 
which tend to reduce the power of the experiment to gene i py CANIBWGII ATG Starts er 
population of interest. Some ot these factors, 4% discussed by Ys 
saprsecemes J ‘ment. The researcher tries his best to control all 

me in a . ny experimental change. AS a Consequence 
5 at ui ae ee ae 2 artificial and less resembles the real life 
of this effort, the experimental situation =. act Theroverall results that on ih iad : 
situation regarding which generalizations are to be Bolan aaa cine 
such an experimental situation, no concrete and sound gener. 


are 


2. Interaction effects of selection bias: We have seen how selectio on aa threat lo the 
internal validity of the research, It becomes a threat to the external va ‘ He me Neen ee 
does not draw a random sample or a representative sample trom He = . hal i we When 
the samples are not representative of the population, it becomes et : ! : a : a 
to generalize the conclusion, Psychological, educational and wits ae he a e rare y, if 
ever, utilize a random sample. Most of these researches are conducted onc we samples, 
which are easily available to the researchers. Hence, the genera lization roan Es su les Can be 
made with a considerable risk. In order to eliminate the selection bias, as it relates to external 
validity, itis essential for the researcher that he must draw a random sample from the population, 


3. Prior knowledge about subjects: When the researcher has some prior knowledge about 
the subjects, he is likely to show either personal bias in form ol favour, or he may provide some 
clue that will affect the objective judgement of the subjects. In either case, sound generalization 
trom the experimental setting becomes difficult, 

4. Carry-over effects of prior treatment; When the experimental design is such that the same 
subjects serve as members of both the control and the experimental conditions, the chances are 
that the effect of the first treatment will not be fully wiped out and thus the effect is carried to the 
second treatment. The effect may be facilitating or interfering, depending upon the nature of the 
problem being studied. In either case, generalization becomes hazardous. 


9. Hawthorne effect. This effect was first pointed out in 1920 by Mayo, Roethlisberger and 
Dickson in a series of experiments conducted at the Hawthorne plant of the Western Electrical 
Company in Chicago, The purpose of the experiments was to examine the effect of certain 
working conditions upon the workers’ oulpul. The intensity of light was one of the manipulated 
variables. The investigators found that as the intensity of light was increased, the workers’ output 
also increased. Atter reaching a peak, the investigators gradually decreased the intensity of light 
and surprisingly enough, again the workers’ Oulpul increased instead ot decreasing, It was 
concluded that the workers’ awareness ot participation and the attention given by them were 
important motivating factors, which led to the increased oulpul even during the poor level of 
illumination. From these studies, the term ‘Hawthorne effect’ was introduced and it refers to the 
increments in pertormance prompted by the subjects’ awareness of participation in an 
experiment. When subjects demonstrate the | lawthorne effect in any experiment, the results of 
the experiment are likely to differ from the results of hose experiments where no such effects are 
being demonstrated. Therefore, the conc lusions thus reached become difficult to be generalized, 





we lal Ww hematifie Keseqneh 401 
Since there are several perspe 


an experiment or study, there 
examined: temporal validity 


‘ClIVES from which one 
are twa additional 
and ecological validity, 


extent to which general; zations fr 
- | ime 7 j Row 4 OM a 
across different time periods, It has two 


(i) A treatment in a study may fast for a Particular period of time and th 
fixed period of time between a treatment and me 1s Gemen of : vt 
Here, temporal validity is the extent to which saab ent of de 
variables can be generalized to other time frames bse 
(ii) A study occurs in a specific time, that is, in 
validity is the extent to which results ¢ 
Many studies done in the 
processes, namely, thinking, 
studies of various social processes like atti 
influenced by societal chan 
processes have poor degree 


Ecological validity is the extent to which the situation and behaviours 
thase found in natural settings or environment. In other words, ecological validity is the extent to 
which the experimental situation can be generalized to natural settings. For example. many 
experiments done in the field of verbal learning and memory utilize nonsense syllables or 
consonant syllables. But such syllables are not an everyday experience for most people. 
Therefore, such a learning process is an unusual one that occurs only in the laboratory. Thus such 
research lacks ecological validity, We can never : 


be confident that it accu rately generalizes to and 
describes natural and day-to-day learning processes of the person. In this way we can finally 
conclude that if the research design does not have ecological validity, we end up foc using on 


what the subjects can do in the experiment and not what they usually do in their lives, 


CONTROLLING THREATS TO RELIABILITY AND VALIDITY IN RESEARCH 


Threats to reliability and validity of a research are shown due toa 
Some of them have already been discussed, Here. we sh 
anticipate and eliminate anything that may threaten 
some such important ways: 

(i) Doing 


ran generalize the obtaing 


d results in 
MSPeECls of extern 


al validity which must be 
tudy are accurate 


may also be a 
pendent variables, 
rved relationship between the 


4 particular month or 
an be general 


Past are today still considered 


year, Here temporal 
ized to other months or years, 

valid. Studies of basic psychological 
ation are still valid today. However, 
nd other such behaviours which are 


Bes are less well generalized today, Therefore. study of such social 


of temporal validity. 


In a study are like 


ariety of conditional factors, 
all focus on those ways that try to 
the reliability and validity. Following are 


a reliable manipulation: Reliable manipulation of independent variables is one 
way to arrest such threats. As we know, reliability means that our results are consistent 


anc 
contain no error. Reliab 


ility is sought not only in terms of measuring subjects’ scores but is also 
sought in terms of manipulating the independent variable, By reliable manipulation of 
independent variable is meant that all subjects in one condition must receive the same treatment, 
that js, the same amount of independent variable, and when the experimenter changes to another 
condition, all subjects there should receive the same, new amount of independent variables. For 
example, if the experimenter is interested in knowing the Impact of degree of temperature upon 
work performance, and if he selects two levels of conditions of temperature say, 60 degree F and 
90 degree F then he wants precisely 60 degrees for all subjects in one condition and 90 degrees 
lor all subjects in the other condition. 

It there is inconsistency in our manipulations, the conclusion will be biased on the wrong 
amounts af the independent variables, Apart trom this, H the experimental situation is in any pie 
different for each subject (due to inconsistency in manipulation ot independent ala we 
shall get different responses and thus, variability among, scores within each ¢ schist ee x 
Producing higher degree of error variance. The greater the error variance, the poorer the strength 
OF relationship and dependability upon the variables under study, 


Behaviour Screen 


pment ‘ment, the experimenter seek. 
riables: In any gs el | validity. The exper; "> dra 

- a nfaunding varia importance for interna cs - ¥ IMente, 

“oO it | ; e real caus 3 

fi) control ofc his of paramoun ndent variables are Me : €s for Change. 

ii} hic es in inaepr “he manipulation of independent variable shat 
7 TT aad tat Oe | pe 

be confident o gor this it 15 © dig have occurred WIEN SOME SSFaneOUs Variable: a 

ables. FO! fe eee te aiferenice between the conditions exist, jp add 
tic ng can be explained through an Ci 


onfoundi é : . 
fc rder to know the impact of room temperature 
ose that this experiment has been designed; 

. ned 


menter is present when the room has 

that in one concilnl’ condition a female exp Grimenter Is Present wher, 
ees In this case, We commen © Lidia) Serta Whethe 
S. due: 10 differences in Foor temperature or . 
cause of the fact that variable of temperature 


ition 


led. em roacess 0 

ariable. The P = 
ndent cos ee conducted ino 
riment I Now further SUPP 


are 


amperature © 
This is be 


room has 4 Sie 
in subjects’ 486 
tarances in subjer™ ee 
i fenerices in the sex of the sper 
ifferenc or 
confounds with the sex variab 
It is essential that the ae 
confounding. ier ae ee 
Dies. | : * a e 
oS inene such elimination 


successtully employed hea 
should refer to Chapte of treatment: Diffusion of Ire atment also threatens ae 


(iii) Defence 54 Sa sl variable. In tact, diffusion of I reatment Occurs when 
validity associated with l | eae somehow aware ot the treatment given in other conditions. 
ca PLS Sieh ween-subjects or between-group design in which subjects who haye 
soonande le waeliion tell other potential subjects of other group/condition about their 
participated in one con a consequence, these potential subjects are now aware of the 
treatment / dy a d ne | f the manipulation is reduced. Not only 
treatment to be given, 


ce the strength or impact O HOO fee 
hig, the i ternal validity isa : : instead ol being influenced by the 
thes, the Ie 


Iso threatened, that is, subjects, ed by the 
dition of the independent variable, are now influenced by the extraneous information about 
condi | : 
the condition of the experiment. — _ | 
Ir is, therefore, essential that some defensive steps be taken against such diffusion. These 


essiveness 


> Fist employ a design that eliminates Potentia) 
liminate potential confounding by controllin 
balancing and randomization can be 
For other such methods, the reader 


perimente 
ways fo enn 
s of elimination, 
and control. 


are— 
(a) One line of defense is to explain to the subjects why it is relevant and important that they 


should not tell other potential subjects about the details of the experiment. 

(b) The whole experiment or study should be conducted in a very brie! period of time so that 
the subjects may not get time to communicate with each other. 

(c) Subjects might be tested from different locations so that they have little physical contact 
with each other and thus, little opportunity to communicate with each other. 

(d) If possible, the treatments can also be disguised, 

; (iv) Adoption of consistent procedures: A very important issue in designing the procedures 
of the study is the need to produce consistency within and between conditions. For achieving this 
consistency, a careful planning is needed at the following front: 
™ hal oe the instruction to be given in the study/experiment should clearly one 

ae subjects may consistently attend to the stimulus and respond appropriatery- 
Particularly, the experimenter should describe the « -e of events, identify the stimuli they 
should attend to and explain how to indicate a re ee ait aah x pea ancy, that is, 
afl aubleets song msi a ‘san a response, All these will create consiste ae 
stimuli/behaviour that make ‘i tisk Alfie sity MenGed taste without inverse 
erent tor different subjects. 


Social Scientific Research 403 


(b) Automation: Automation provides better consistency with respect to the presentation of 
<tjmulus. For example, when a stopwatch is used for controlling time for presenting visual! 
<timuli, it may introduce INCONSIRNCY because there may occur variation in actual presentation 
time. Therefore, the best alternative is to rely on automated equipment-electronic timers, slide 

rojectors, and the like for controlling and presenting the stimuli. 

(c) Pilot studies: A Pilot study is defined as the miniature version of a final study in which 
the ex perimenter/researcher tests the procedures of the final study. In pilot study a sample of the 
type of su bjects that will be used in final study is tested and as a consequence, the experimenter 
determines such matters as whether the given instructions are clear and the subjects perform 
accordingly, whether the task is doable within the time constraints or other demands, and 
whether the manipulation actually works and is strong. In addition to all these, the pilot study 
also allows the researcher to detect any bugs in the equipment or procedure. In this way, a pilot 
study provides the experimenter a good background for providing consistency of procedure. 

(d) Group testing: Testing subjects in a group is more common and the advantage of group 
resting is greater efficiency in collecting data. Apart from this, in group testing all subjects are 
rested in the same condition at a time, and thus all subjects experience the same condition 
consistently presented. The disadvantage is that the subjects may create unnecessary noise which 
may provide distraction to some other subjects. Therefore, such extraneous behaviours have to 
be controlled so that the potential confoundings may be prevented. 

(e) Manipulation checks: Manipulation check is a sort of measurement which determines 
whether the manipulation of independent variable has influenced the behaviour of the subjects 
in the environment. Usually such a check is made after the subjects have performed the task. It is 
particularly useful for ensuring that an independent variable has the intended influence on the 
intervening variable. For example, if the researcher manipulates frustration to influence the 
subject's intervening anger, and subsequently, measure their aggressiveness, he can check the 
manipulation by directly measuring the subject's anger. If the subjects make the desired 
responses, he can put greater contidence in concluding that differences in aggression are the 
result of differences in anger. If such a thing does not happen, the experimenter may go backward 
for redesigning the study. 

In this way, we tind that there are several ways through which threats to reliability and 


validity in an experiment can be controlled. 


PHASES OR STAGES IN RESEARCH 

The purpose of this section is to provide the potential researcher with important steps needed for 
conducting a scientific research. An experienced researcher knows that research is a tedious, 
painful and slow-moving job. However, if he follows certain steps in conducting the research, the 


work can be carried smoothly with least difficulty. 


Identifying the Problem 
The first step in conducting a research is to identify the problem. The researcher must discover a 
suitable problem and define it operationally. A problem is defined as that interrogative statement, 
which shows a relationship between two or more variables in an unambiguous manner. For 
example, take the following statement: 

What is the relationship between academic ability and socio-economic status 

A problem has several other characteristics which become the relevant considerations in 
choosing a scientific problem. For identifying a good solvable problem, the investigator 
undertakes the review of the literature. A body of prior work related to a research problem is 
referred to as the literature. Scientific research includes a review of the relevant literature. When a 
researcher reviews the previous researches in related fields, he becomes familiar with several 
















t 
cane h Methods" iain 
arene ; 
tial aml shy aus advantage oa review OF the IHerature 1S thay 
, we laa sjreadly BEEN done ye ‘¢cam env Ruidaney a 
and uw cama oF what Pe * fa review 0 The Hheralure is ourfoly | 
duplicate rpase of VF 


herpes voclininas research. The 4 vais heer found to he conceptually arid | 
eugeen eget the variables wie Geld, Thus. the review of the literature 
' ie Peal auth — , or : y : 
n the levant for the given study. Second, the review of a” 
aaegne srevious WOrK done. This has a twotold advantage. 
a ost work and provides an opportunity forthe meaning 
riche many researches in social science are ¢, sehatee 
we BI, | : bl 
| | pe Only after the review of the literature Can this fag i 
for further exp" rure helps the researcher in synthesizing the EXPanding any 
jot the Ineré in drawing useful conclusions teparding w 


is facilitates , , SS BACing | i 
| sai wedge. meaningful way for their subsequent applications, | Fours 
variables under study and ae ) ~ in redefining the variables and determining the Meanings and 
neview of the literature sigh a the researcher can build up a case as well as a COME fog 
— ante has merit and applicability. | . 
further iieatiadte ‘ + sources of review of the literature. Journals, books, abstracts, j 

There are difteren a 


: ae { the literature. Chapter 19 has been ¢ 
are major sources of review 9 : 7 
sndicals, etc., are the maj 


devoted to the review of the Iterature. 


_ 





epi ,< identified the problem and reviewed the relevant literature te 

When the reenertal ict ‘ kind of suggested answer 10 the problem. Hypothesis n woke 
peaibas afr ea lement showing @ ut ationship between variables under study, It is 9 id 
in the form of a declarative sentence. For example: . | | 
|. Persons coming from the upper and middle socio-economic status have stronger 

| academic ability than persons t OmIng from a lower socio-economic status, 

>. Reward facilitates learning. - » 
varch hypothesis has several characteristics. First, the hypothesis 

nae rie be tao ti and theories. Second, it should be testable. In other we 
hypothesis should be such which can be reinstated in a way that the researcher can test it and. 
show that it is probably true or probably false. Third, a hypothesis should be reasonable and 
expressed in the simplest possible words. dh 
For unbiased research, the researcher must formulate a hypothesis in advance of the 
data-gathering process. No hypothesis should be formulated after the data are collected. 
Chapter 18 presents a detailed discussion regarding the different aspects of hypothesis, 


identifying, Manipulating and Controlling Variables 

Vanables are defined as those characteristics which are manipulated, controlled and observed by 
the expenmenter, Al least three types of variables must be recognized at the outsel—the 
dependent variable, the independent variable and the extraneous variable. The depe 
‘anable is one about which the prediction is made on the basis of the experiment. In other 


me emer variable is the characteristic or condition that changes as the experit 
hanges the independent variables. The independent variable is that condition or characteris 
which is manipulated or selected by | 


observed phenomen: the experimenter in order to find out its relationship to 50me 
dependias vedidte 7 lanai variable is the uncontrolled variable thal may affect I 
extraneous vanable and sei: ‘er Is Not interested in the changes produced due 108 
salsa known as the relevant es ae 1 Control it as far as practicable. The extraneous ¥é ron 
variates GMmiMOnly Used in. - - In Chapter 20, a detailed discussion of the ditterent} 0 
MN psychological, sociological and educational research is prese 




















a0 lil ern (fla Maerrooepry de ee, 


Formulating 4 Research Design 

Areseatt bh design may be regarded as the bhuepemnt rf thew precedes acer h are eben) bry thee 
resoant her for testing Nie fetationship between the dependent variathle ant the independent 
variable There are several kinds of experimental designs ancl the velo tien ot srry ror ve baud) 
upon the purpose af the research, types of variables tes he ec etyesdleed jeu manipulated os well as 
upon the conditions under whiel the epenment 1 to he crater] The main putter ot the 
experimental design is lo help the rewearc her in rraciparlatirg the Whepenelent satiabhes treverly 
and to provide maximum control of the esxtranequs variables w that it may he cavd ath all 
certainty that the experimental change is due te only the maneplanen of the eeperienental 
variable. In Chapter 21, 4 detailed discussion of the different types of teweaureh designs hae 
heen presented, 


Constructing Devices for Observation and Measurement 


When the research design has been formulated, the next step 1s to construct of cotlert the tools ot 
research for scientific observation and measurement. Questionnaires, opcnonnaires and 
interviews are the most common tools which have been developed for psychologe al, 
sociological and educational research. All these tools of research are ways through which data 


are collected by asking for information from persons rather than observing them. Theve 
techniques have been discussed separately in Chapter | 2. 


Summarizing Results 


The next step in scientific research is to summarize the results so that a suitable analysis can be 
made, There are wo common methods for summarizing results—the tabular method and the 
graphic method. In the tabular method the obtained data are reduced to some convenient tables. 
which facilitate the use of appropriate statistical tests. In the graphic methed, the obtained data 
are shown through graphs and pictures. In general, the graphic method has an advantage over the 
tabular method in the sense that it provides quick deliberation and understanding to those who 
examine it. But the general limitation of the graphic method is that complex data are difficult to 
be displayed whereas the same can be easily shown through the tabular method. A good 
researcher, therefore, should summarize the results by utilizing both the methods 

Carrying out Statistical Analysis 

When the data have been reduced to the tabular form, the next step is to Carry out appropriate 
statistical analysis. There are two types of statistical tests—the parametric test and the non 
parametric test. Depending upon the nature of data and purpose of the experiment, either a 
Parametric statistic of a nonparametric statistic is chosen tor statistical analysis. In general, the 
purpose of carrying out the statistical analysis is to reject the null hypothesis so that the alternative 
hypothesis may be accepted, Commonly, there are two levels of signiticance at which the null 
hypothesis is rejected —0,05 level (or 5% level) and 0.01 level (or 1% level), These levels are also 
known as the alpha level. A separate chapter (Chapter 24) has been devoted to statistical analysis. 


Drawing Conclusions 


The investigator, after analyzing the result, draws some conclusions. In fact, the investigator 
Warts lo make some statement about the research problem which he could not make without 
Conducting his research. Whatever conclusion is. atrived at, he generalizes it to the whole 
Population. At this stage, the investigator also makes some predictions about certain related 
events or behaviour in new situations. 


ee ra ® 


RESEARCH 


ONAL oe 
TYPES OF EDUCATI was of educational research. This fact js lurthe 


difficult to classify the 


very 
tts ve") hat d 


josie ebvornet 2 ; made by Best & Kalin (1998) in this direction is a ver 
However, the aes so wide and comprehensive that all researches fal 
ir classification I = vn yg eat 
all in three types or a combination thereot. . _ 
following , ae: research: A historical research is one which INVesligates 
I. io lane the events of the past for the purpose of discovering 
an z . 


that are helpful and useful in understanding the past and the presen and, to 4 


and interprets the conditions that exist. In such a research an attempt is 


or contrast among those variables. Thus descriptive research basica] ly 
It is also known as nonexperimental or correlational research. 


3. Experimental research; An experimental research is one in which t 
upon the variable relationship. Here certain variables are controlled 
thier effect is examined upon some other variables. Thus experimental research basicg 
describes what will happen when certain variables are Cc a 
or manipulated, 

We shall discuss in detail the descriptive or nonexperimental research and experimental 
research in this chapter, whereas historical research will be discussed in Chapter 17, call 


TYPES OF RESEARCH: EXPERIMENTAL AND NONEXPERIMENTAL 

Any attempt to classify behavioural research presents a difficult problem because there is 
universally acceptable classification of research Almost every textbook suggests a differen 
system of classification of research ana this fact in itself th 
nonavailability of a universal system of classific 
made here to present a near-co 


is a convincing evidence for the 
ation. Despite these differences, an attempt is 
mmon classification of research. 


Based on the application of research study, the most general way of classifying research isto 


3 ate ‘et ci OF pure or basic research and applied research, A fundamental 
xii bt tae qe process where the researcher's aim is to develop a theory or 
senate = a ae ue variables IN a situation and by discovering broad 
cenchudencas te out those variables, It utilizes a careful sample so that ils 

© Beneralized beyond the immediate situation. However, a fundamental 


research has little conc | 
oe ete with the actual anplication of ite aries izations. Applied 
ah: .. ICatior Sprine alizations. 
research, as its name j ses Pp 1 Of its principles or peneralizatio i 
a 


research to the actual i i “ ee on ee eal — 
fundamental research. It algo anaes e problems. It has many characteristics oe 
various sampling. te fii ques i's make Beneralizations about the population and ie : 
research is not to develop theories ee Samples for jts study. The main purpose of ae 
Popular type of applied rese * a fact but to test those theories in actual situations: © 
emphasizes 3 problem ree action research’. In action research the reseatt he 
Fes€archer here fry : ediate, urgent and has local applicability. Thus, 
Not upon pene immediate Consequences and applications of a probien i 
a rs may under ‘Upon the development of a theory xi ne" 
Immediate le reasons unde rlying unhealthy classroom 


r i: May benefit a ; - sed 
will be dis USSed | ; tem nefit the local ¢ lassroom students, Types ol applied a 
fe =| i ‘ 


which is imm 


alee in hj 


Ne primary focus 
OF manipulated ang 


etl com 


ses F COMpiiesun 
Imost every textbook suggests a different system of els sa el 
. ‘ a : | n r= 


Y Scientific ‘tl, 


Sound generaj as 





extent, the anticipated future. Thus, historical research describes what Was. limite 
. Descriptive research: A descriptive research is one which describes, records. 3 
ae Fi sin 


1Or ! fe ra Made to discm, 
relationship between existing nonmanipulated variables, apart from some com “Over 


Parison, 


Wwctal Scientific Rese arch 47 


Depending upon the objective a the research, it may be classified into experimental 
research and nonexperimental research, An experimenta | research is one where the independe 
variables can be directly manipulated by the experimenter, ane participants or whi : it 
randomly assigned into different lreatment conditions. In the Social sciences, experimental 
research is SOMEONES callec 1 SR research her dUse the researcher manipulates a stimulus (or 
stimmull) in order lo establish whether or not this produc €5 a change in a certain response (or 
responses). It ts further divided Into two main types— laboratory experiments and field 
experiments, A nonexperimental research j« one where independent variables cannot be 
manipulated and therefore, cannot be experimentally studied. The subjects, too, are not 
randomly assigned into different treatment conditions. In nonexperimental research, also called 
post facto researc h, the responses ofa BrOup of subjec 's are measured on one variable and then 
compared with their measured responses on another variable. Such research is also called R/R 
research because changes in one set of responses (R) are compared with possible changes in 
another set of responses (R), Anonexperimental research, or descriptive research, can be divided 
into six main types—field studies, ex post facto research, survey research, content analysis, case 
study and ethnographic study. Perhaps more than half of the researches in psychology, soc iology 
and education are nonexperimental, A detailed discussion of each of these six types of researches 
as well as the two types of experimental research is discussed ahead. 

Based on inquiry mode, there are two types of research—quantitative research and 
qualitative research. A quantitative research is a research where objectives, design, sample and 
statistical analysis are predetermined. It is also sometimes known as structured research, A 
qualitative research is one which allows flexibility in all these aspects of the process. A detailed 
discussion of qualitative research has been done in chapter 22. This research is also known as 
unstructured research. In general, a quantitative research is more appropriate for determining the 
extent of the problem whereas qualitative research is more appropriate for exploring the nature ot 
the problem. Figure 15.1 presents the classification of research from these three perspectives 
(application, objectives and inquiry). 


Types of Research 
































: 4 
on . ——— 
From the pointview From the pointview | From the pointview | 
of application of objectives |___of ingiry | 
ti | 
| Pure research | | Applied research | | | 
i | ! | 
Experimental Nonexperimental | 
oat _ | 
{i t—SY 


= = ——__—_—_t — 
| Quantitative research | Qualitative research 
a aa lacie eee ee, 


Figure 15.1 Classification of research from different modes 


ccm siete iques for studying the relationship 
A laboratory experiment is one of the most powerlul tec ar ae uch al a toblem tha 
between variables under controlled conditions. It may be gerine . tle 4 aa oder wo havent 
Situation in which some variables are mani pulated and ie pope ol en ill 
tect upon the dependent variable. The variables exe fey sc xemn ne eidvateud ct 
independent variables and the variables which are an Fan ulation of an independent 
relevant variables, Thus in a laboratory experiment, the effect oF manip 


. Rehartounal Sciences 
antl Research Methods in Bebact 
sen gad = 
rae Mensuremen® 
408 jee = | : 
le ig observed under controlled conditions. Fa 


| depe ndent varia ese “one in which the j Ye Kay: 
variable upon the defined a laboratory op tig se 4 inwhich he cet eTtor Clea Kat 
(1953, 137) have act conditions he wants to have i i Ok Controls Some Es 4 
situation with the ex Analyzing this definition, two saltent features of a ja.” and 
| ’ Dora 
0 
manipulates Oo 
ua temerge. 
experimen es 
First, in a laboratory a la 
‘ble extraneous variables are CON 
ss Si ani 
yr kept ata minimum, . oe | 
controlled or kept a riment the variables are manipulated (called inde 


boratory expe ye = 
fect of manipulation of these variables upon the dependent fei 


her va riables. ; 


the experimenter creates a Situation in which 
led so that the variances produced by them a the 
STE als 
0 


Second, in a la 
variables) and the e 
is examined. 


There are several ways of manipulating and controlling the variables, which have 
ere \ 


a 1 However, one of the most Common met 
discussed separately In Chapter — | scraat striction, ALtHe'staTtAt th. — used in 
behavioural research is the use of pre-experinic seat thee PEriment, the 

on come instructions verbally in a manner that the variables are automat 
anne yale le. if one wants to examine the effect of competitio; al 
manipulated. For example, I 0n Fubjects matched in as UPON the 
fa task, one may take two groups Of subjects Matched in age, sex and intel]; 
periormance 0 jon to one group in a way which enhances th Bence. 
Subsequently, one may impart Instruction Oe oth For inst » Ne Feeling of 
competition and impart a neutral instruction ta the ot ie eal Instance, for enhancing the 
sense of competition one may say that all subjects would be ranked in terms of their individual 
performance and that the person who contributes most would get the highest rank and the person 
who contributes the least would get the lowest rank. In a group where no sense of Competition js 
to be introduced, instruction may simply specify that the members of the group are required to 
perform the task as accurately and quickly as possible. In the above experiment, the competition 
was the independent variable and the performance of the task was the dependent variable. One 
group was instructed in a way to enhance the competition (the independent variable) and 
another group was a neutral group having no instruction for enhancing the feeling of 
competition. The former group is known as the experimental group and the latter group is known 
as the control group. 
Following Kerlinger (1986), there are three main purposes of a laboratory experiment, First, 
a laboratory experiment purports to discover a relationship between the dependent variable and 
the independent variable under pure, uncontaminated and controlled conditions, When a 
particular relationship is discovered, the experimenter is better able to predict the dependent 
variable. Second, a laboratory experiment helps in testing the accuracy of predictions derived 
from theses or researches. Third, a laboratory experiment helps in building the theoretical 
systems by refining theories and hypotheses and thus, provides a breeding ground for scientific 
evaluation of those theories and hypotheses, 
A laboratory experiment has some strengths and weaknesses. Its primary strengths or 
advantages may be enumerated as mentioned below. | 

extraneous variables are maximall ee aed ia ae ie apeae iene oly 

by the manipulation of the indep sf it : ee ti oe paseo we 

studied in such a controlled asl ai re di x eat i s 

has the fundamental requisite for sav : more dependable and thus the laboratory exyemira 
2. ‘Aliboratiiy Bieri y nvestigation, that is, internal validity. 4 

by a particular eipetlinenter, he replicable, lf one has some doubt over the conclusion reach 
“7e can replicate the design, conduct the experiment and verify the 


conclusion. A | | ; 
the findi aeratity “*periment is also replicated when one wants to substantiate or refute 
NBS of earlier labor atory experimenters —— | 


sacicl Sctentific Research 409 





3 A laboratory experiment provides a g a a 
+ sspaedent variables. Not only this, if the Fe heii ieee ee of the 
-andomly to different treatment conditions, they can do so easily oe assign the subjects 
introducing subjectivity in the controlled situation, minimum risk of 

4. A Seat a ceca has sulicient degree of internal validity because the 

experimenter usua V maximum possible control over the extraneous variables and th 
manipulation of independent variables. | ne 

Despite its strength, a laboratory experiment has some w Iss ‘acl; | 

1 laboratory experiment lacks external validity nrc wt selene sahaies 

Pay : ty, although it has a sufficient degree of 

internal validity. When a laboratory experimenter, for example, reaches some conclusions 
regarding the rate of learning a maze by a rat, can he generalize it to the human maze learning? 
The answer is, yes, to some extent. Even when he does so, he does this at considerable risk, This 
lack of external validity makes the laboratory experimenter unable to make generalizations with 
full confidence. 

2. The experimental situation of a laboratory experiment is said to be an artificial and a 
weak one. But this criticism appears to be inappropriate because it results from inaccurate 
understanding of the purpose of a laboratory experiment. In fact, a laboratory experiment should 
not, and need not, be an exact duplication of a real-life situation. If one wants to study some 
problems in a real-life situation, he should not bother to set up a laboratory experiment 
duplicating such a situation. A laboratory experiment simply creates a situation where the 
variables are controlled and manipulated under specially defined conditions. Such a situation 
may or may not be encountered in real life. In fact, some laboratory situations are such that they 
can never be encountered in real life. It may, therefore, be concluded that the criticism regarding 
the creation of an artificial situation comes mainly from those persons who misunderstand the 
purpose of a laboratory experiment. 

3. In most laboratory experiments, a complex design is followed and as a consequence, 
more than two variables are simultaneously manipulated. It is a matter of common observation 
that if more variables are manipulated simultaneously, the strength of each variable is lowered. 
This is particularly obvious in those laboratory situations where the manipulation of the variables 
is done through verbal instructions. 

4. According to Robinson (1976), there are some situations like mass riots which can't be 
studied in a laboratory because of sheer physical impossibility, 

5. For want of social approval, the behaviour of persons in certain situations may not be 
studied by laboratory research. For example, if the experimenter is interested in studying the 
impact of deprivation of visual sense modality upon depth perception in children, he may not be 
able to carry out the laboratory experiment, because no parents will allow their children to 
experience such harmitul deprivation. 

6. Laboratory-experiment research is costly as well as time-taking. Some of the apparatuses 
are so costly that the experimenter normally fails to buy them and hence, remains unable to 
conduct the experiment. Not only this, some experiments particularly those relating to 
longitudinal genetic studies on humans, can’t be successtully carried out because the time 
required to carry through even a few generations would be very long. | 

laboratory experiments are the greatest achievements of 
possess the most important prerequisites for any 
lidity, although they have the trait of 


Despite these weaknesses, 
behavioural scientists. Such experiments 
behavioural investigation, that is, the internal va 
Poor generalizability. 


i 7 gone al Se fevTices 
win: epariet rt 
i Ke earch wethods i 
Z aie a 
erie HES & 
i, heats ry 
410 Te Ti i 


ich tools in the hands of soc iologists, edu¢ atig 
atory experiment, A field experimen, | 

r less realistic situation or field Where 
pendent variables under the Maximum 


j ts : 
Field sone sone of the com bial pean: abor 
ypimIent Ie cee jmilar to a te 
do “al psychologists phi i out ina more O 
cid , -arrie ; oe Pe 
— a controlled study car r more inde 


. as nipnulates ane @ Fa >)ste 
Tete successfully man'p “Following Shaughnessy & £e pinaster M990), When the 
experimen led conditions. on independent variables in natural setting for determinin. 
more 


ible contro 
ania manipulates oe i srocedure 1s 3 re 
n behaviour, [ne HB ed in natural setting. The setting may be a school, a factory 
connie eet comer or any place in which the behaviour can ui 
eh son when the research is more applied to or seeks tg 


natural setting. Analyzing this delinition, we Bet three 


eX 
heir effect Upe 
their effect” drole 
riment is an experimen! “i 
hopping comple» 
riments are comm 
that occur In 


expe 
a hospital, 4 § 
identified, Such experi 2 
examine complex behaviours 


1 


riment which is carried out in the artificial situation of a laboratory, 

hi } ry ex erl : . ’ j apy A 

a ea \ field experimenter, like the laboratory experimenter, also manipulates the 
Second, fhe le = aie variables 
<n 4 VOUS Varadics. 

and controls the extrar 

independent variables ¢ ; sitabiten simcdiae ig: carck : 
Third, a field experimenter manipulates the variables under as carefully controlled 


ditions as the situation permits. On this point, a field experiment differs Irom a laborata 
conditions latter case the situation is controlled in all possible respects whereas in 


re ramen i of control 1s dependent to the one on iil aa by the Situation, 
As Kerlinger (1973, 401) has opined, “Sometimes it is hard to la ela parley ar study laboratory 
experiment’ or ‘field experiment.’ Where the laboratory experiment Aras a maximum of control, 
most field experiments must operate with less control, a factor that is often a severe handicap to 


the experiment.” | 
Like the laboratory experiment, the field experiment a Iso has strengths and weaknesses, The 


principal strengths may be enlisted as follows: 

1. A field experiment deals with the realistic life situation. Hence, it is more suited for 
studying social changes, social processes and social influences. Several field experiments have 
been conducted for studying these social processes and fruittul results have been obtained, 

2. Since in a field experiment observations are done in natural settings, such studies 
generally have a high degree of external and ecological validity, The obtained findings are 
generalized to the real world because they are obtained in the real world. 

3. Ina field experiment, we can observe behaviour when subjects are psychologically 
engaged in a real situation. As a consequence, we have greater experimental realism which is 
defined as the extent to which the experimental task engages subjects psychologically such that 
they become less concemed with demand characteristics. Such advantage is especially 
iii eee ee of depending upon potentially unreliable self-reports of what subjects 

» We Can observe what they are really doing in the situation. 
‘is Po pi hon Ca ye : - BO into the real world and replicate the oo 
laboratory, we go out and see wath zo findings. After knowing about a behaviour In | 
| rae operates the way we think it does. 


The princi ) 
= ‘pal weaknes ; 
f ses of field experiments are ac given below. 


possibility that the effect at reese out in a realistic situation, there ts aba 
environmental variables — pendent variables is contaminated with uncontro 
variable and thereby, eoneititiate aaa noise and gatherings may affect the dependem 
Experiment, this problem clive tae Nene af the independent variable. In a laboratory 
~? AOL arise because of the iully controlled laboratory situation. 


I. Since a field experime 


known as field experiment. In SUM, a file 


Soctel Schentifie Research: #04 


ifthe situation ts somehow fully controlled ina field experiment, it would prove tobe a 
 powertul tool than a laboratory experiment. 
ce in many field situations, the manipulation of independent variables may be difficull duc 
src yperation ol subjects. bor example, suppose the Investijalor Warts to test whether or 
leads to aggression ina proup of small children, He may be producing, frustration 
¢ children (called the experimental group) and may be keeping another group of 
iran. a neutral one, that is, free from frustration, Hfthe parents of the children come to know 
hae "ie ards are to be exposed to frustrating situations, they may not like iLand may restrain 
ine: yoann from being exposed to field situations, Likewise, numerous examples of a field 
panne can be cited where the manipulation of independent variables may pose difficulties to 


rimenter due to similar reasons. 


| jowevel 


ian 


not trust ration 


in. ane group © 


git 


the field expe 
; Inafield experiment, itis not possible to achieve a hip 


angie come uncontrolled environmental variables. But in the case of a laboratary 

—— it is possible to achieve a high depree of precision because all extraneous variables 
controlled to the maximum possible extent. 

4. Afield experiment requires that the investigator has high social skills to deal effec tively 
with people ina field situation, It automatically implies, then, that an InWeESgSIOr sii Pa 
social skills will not be very effective in conducting a field experiment, It has also been observer 
that even where the investigator possesses a high degree of skill, a field experiment usually takes 
ne that subjects’ cooperation with a uniform spirit becomes doubtful. This tends to 


h depree of precision or accuracy 


ol 


such a long tir 
lower the validity of the experiment. 


Field Studies 
x post facto scientific study which systematically discovers relations and interactions 


Any e | 
ariables in real life situations such as a school, factory, community, college, etc., may be 


amon w 
termed as a field study. There are two important features of a field study, 


First, a field study is an ex post facto study and an ex post facto study is one where the 
investivator tries to trace an effect that has already been produced to its probable causes. 

Second, in any field study no independent variables are manipulated and thus it differs frorn 
a field experiment where the independent variables are manipulated for determining relations 
among variables. Ina field study, the investigator depends upon the existing conditions of a field 


situation as well as upon the selection of subjects for determining the relationship 


amonp variables, 

Although field studies bear some similarity to survey studies, the two types of researches 
differ. First, in a survey the ernphasis is upon the selection of a representative sample to make an 
accurate description of the characteristics of the larger universe (survey research) and the 
characteristies of a fraction of the universe (sample survey), But in field studies the ernphasis 1s 
upon the processes under investigation, rather than upon their typicality in the universe oF 
population (Katz 1954). The investigator in a field study, therefore, may or may not give that 
emphasis to representative sampling. Secand, the investigator ina field study generally deals sink 
a single group, Community or section tor analyzing the social and psychological processes 
the investigator generally takes a cross-sec tional group oF 
al and psychological processes, Thus, in a field study the 
and therefore, it can be reasonably 
ial interactions or interrelations than 


whereas in survey studies, 
Cross-communilty for analyzing sect 
group being studied is small as compared to survey studies 
expected that a field study will provide a better picture of sc 


survey stuclies, 





youral Sclences 
nisand Hosea rch Methods in Hehat 
Measurement 


lzZ Justi, 
Types of Field Studies ; 
bi 953) has divided fel d 
(a) Exploratory field studies P 
(h) Hypothesislesting field studies 

| these | lows. 

A discussion of these two fo 3 

(a) Exploratory field studies: Explorato 


ctudies into two types 


ry field study is one that intends to dic 
ations among those variable «,., 
fi ables in the field situation and find out relations among, variables so that 
significant variabte Pe 


oenatic testing of hypotheses can be laid obviously; 
groundwork rot air nero ne bi to predic! relations lo be foot later 
exploratory field study 7 ach field study, the investigator 1s able to find out a relationshi, 
Onihe seh hut ie remains unahle to provide concrete proot of the CXIStin) 
ere encon te investigator wants to investigate the co rrelates of productivi | 
relationship. Suppose [Ne : 


dijmtica , ¥ ia 
factory through exploratory field study. He may correlate productivity with several factors like 
age and sex of the employees, period 


of work, pay, etc. Bul he will not be forrnulatin 
hypathesis relating to the productivity 4 


nd any of these variables beforehand. It may he that 
the basis of such field studies, he may provide some facts whic A would be explored in future. 

(b) Hypothesis-testing field sudies: gilt peg a hi in which the 
investigator formulates some hypotheses and then soreness -clinccelegie il Provides some 
concrete evidence for such testing. Thus, here the investigator aba al predic lini relat Ions ano 
variables, Suppose, the above study is done by hypothesis-testing field prank the Investigator will 
proceed by formulating certain hypothests and subsequently, he will tes! a reliability and 
validity, He may, for example, formulate the hypothesis thal poor pay lor prolonged hours of 
work may result in lower productivity, On the hasis of the results obtained by hypothesis-testing 
field study, he will verify the truth of the hypothests, | | 

Both these types of field studies are common but the hypothesis-testing field study is more 
popular than exploratory field study, 


The principal advantages of field studies are give below, 

|, A field study provides opportunities for direct observation of social interaction and 
relationships. The investigator can directly make an observation of how people actually behave 
when placed ina group situation, 

2. Afield study is usually carried out in a realistic situation like a school, college, factory, 
community, ete. As such, it avoids the artificialities of laboratory experiments, A tield study may, 
therefore, be more uselul for an educator and a sociologist rather than fora psychologist, 

4, Ina field study, a continued observation fora given period of lime is possible because it 
lends to persist for that period of time, One advantage of this continued observation, says Katz 
(1954), isthat “the timing of certain variables may be ascertained,” 

4, The investipator ina field study is allowed, as Katz says, lo record “reciprocal perceptions 
and interdependent reactions” from groups ol people. One advantage of the reciprocal 
perceptions and interdependent reaction is that they give a total picture of the social structure, 
the complexity of which might otherwise be missed. 


Despite these advantages, afield study has the following Weaknesses: 

1, Generally, field situations where the field studies are carried, are so complex that they 
make the precise measurement of the variables a very difficult task, When variables cannot be 
measured precisely, it adversely affects the internal validity and subsequently the external 
validity of the studies. | | 

2. Inany field study, there are a large number of variables. In 
laboratory experiment or field experiment, these “sii 


salistaction, But in field studies, these variables ¢ 
With less satiel 


experimental research like a 
variables can be controlled with greater 


1 3 é annot be fully controlled or can be controlled 
action. Again, this tends to lower the internal validity of a feld study, 


Soclal Sctenti—ic Kewarch 413 


3, Afield study also suffers from lack of practicality, It generally takes a | 
ig high: and its samples are usually large. All these factors { 
ctudy as far a5 possible, 


pifference between Field Experiment and Field Study 


Field experiment and field study are two important modes of social scientific research. Both are 
imnilar in the sense that both these research studies are conducted in fields, that is, in a realistic 
situation. Another similarity is that both avoid the use of controlled conditions of the laboratory. 
Despite these similarities, a field experiment is different from a field study as indicated below, 

1. A field experiment is an experimental research whereas a field study jis a 
ngnexperimental research, 

2. Ina field experiment, the independent variables are manipulated and its impact upon 
i pendent variables 15 examined. Ina field study the investigator does not manipulate variables; 
rather he aims at discovering the relations and interactions among psychological, sociological 
and educational variables. 

3, A field experiment is more precise than a field study, In a field experiment the 
experimenter is able to maintain control to a greater extent and, therefore, precision in 
measurement of field variables is maintained, But in field studies, the problem of such precision 
ig more acute. 


Onger times its cost 
orce the investigator to avoid a field 


Ex Post Facto Research 


An ex post facto research is one in which the investigators atterpt to trace an effect which has 
already occurred to ils probable causes. Thus the term ex post facto research means that the 
researcher has conducted the study after the events have occurred. (The phrase ex post facto 
means “after the fact”.) The effect becomes the dependent variable and the probable causes 
hecome the independent variable. Thus in the ex post facto research the manifestation of 
independent variables occurs first and then its effect becomes obvious to the investigator. Since 
the independent variables have already occurred, the investigator has no direct control over such 
variables. As such, the purposeful manipulation of the independent variable becomes difficult. A 
simple definition of ex post facto research may he formulated as given below. 
Fx post facto research is that empincal investigation in which the investigator draws the inference 
regarding the relationship between variables on the hasis of such independent variables whose 
maniiestations have already occurred, In this type ot research, the investigator has no direct contral 
over the independent variables because they occur much prior ta producing their effects. 
Suppose the investigator takes a case of lung cancer and then goes back to explore the 
probable causes of it. He may find that cigarette smoking, and a chronic cough are most 
commonly associated with lung cancer. Here, lung cancer is the dependent variable and 
cigarette smoking and the chronic Cough are examples of independent variables, These two 
independent variables have already occurred and thereiore, are beyond the direct control of the 
investigator. It is difficult for the investigator to control these independent yariables either by 
manipulation or by randamization. Take another example. Suppose, the investigator wants to 
study the major determinants of academic achievement among primary-school-going children, 
After a review of the literature, he may find that three factors are most likely to produce 
difierences in academic achievernent, The three factors are SES (socio-economic status), 
motivation and intelligence. After treatment of the data he may find that there is a significant 
difference in academic achievernent of upper SES children, middle SES children and lower SES 
children. The first group and the second group of children achieve higher than the third group. 
He, then, concludes that SES is one of the determinants of academic achievement. Likewise, he 
may find that higher motivation and higher intelligence are associated with higher academic 
achievernent. In the above example, the nature of an ex post facto study is obvious, The 


gag Tests, Mest .- achievement and the independent Variab| 
academic 


rent variable is Me which the investigator has no parses control. pe: Sy 
dependent * over wh n as causal-comparative research or When cone 
a facto rest : ferred tO a correlational research (Best & , 1998) Ona} 
analyses are" . on ee research can Sl ie it a 
The ite Fe following points may be en of diy 
menta 


s of researches: | a ak sl cece: ’ 
between these IW0 ee earch, the investigator has direct control over the jp 
erimental fs ; 
1. In exper 


trol we mean the investigator's power lo Manipulate the inde 
. jrect contro 
variables. By di 


to study the effect ot reward upon learning how to sal 
 iemenees soni d to one group of « hildren, and have a contro| BrOup of 
7 rewar : Words 


® 


exper! 





in 

children, one may i » reward for learning the spellings of words, Here the INVEstj - 

diler vi ‘sh the independent variable, that is, the reward. In ex past lacto r f 
4 4 [ ne i 

directly manipulall 


Jirect control over the independent variables probably because a 
investigator has no me at variables occur prior to the effect they produce and second 
reasons. First, the sca ‘ soa fico research are inherently not directly or e*perimenig 
independent variab oa ain backgrounds, school environment, aptitude. intelligan. 
sink ee 78 manipulable. Thus ex post facto research lacks the direct CONtral of 
parental inference a 
independent variates. ch the random assignment of subjects to different 

2, In experimental research the ran s at random can be easily done byt y 
assignment of measurements To different Soups dl Ta a vate a kK pe 3 as b ex, 
facto research, random assignment of subjects is not possible because the investigator jg bound to 


take things as they are. | 

4, In experimental research, the investigator can make a statement re 
relationship between the dependent variable and the independent variables Wy 
than in ex post facto research, This is because in the former the investigator 
independent variables and hence, can better relate the ch anges in th 0 indepen dent variables with 
changes produced in the dependent variables, This is not possible in ex Post facto research, 


Ex post facto research has some strengths and weaknesses. Its Principal strengths o, 
advantages may be enumerated as follows: 


1, Ex post facto research is considered to be very important in behavioural researches 
where many variables are either not manipulable or not amenable to experimental enquiry, 
Many sociological and educational variables belong lo these categories. However, these 
vatiables work satisfactorily with the controlled enquiry of ex post facto research, 

2, In some circumstances, particularly when one 
of the effect, ex post facto research is more use 

Its principal weaknesses are given below, 

|, In ex post facto 


Barding the 
ith more certain 


Caf Manipulate the 


Wants f0 Hivcstipate Causes on the basis 
ful than experimental research. 


research, as has been discussed 
manipulate the independent variables. When the indepe 
the forecast regarding the telationship between the 
variables becomes dubious, 

2 


i 


earlier, the investigator cannot 
‘ndent variables Cannot be manipulated, 
independent variables and the dependent 
«+ IN ex post facto research, the 
Vatiables through randomization, H 
can he assien the various treatments 
3. The third limitari: 
be able to movie mine se rhe me tna the 
pestle explanation for the relationship | 

| : ationship between 
es. Generally, th . 


heii © Investigator falls prey to what is c 
§ : ‘Of | 
YS Mat because two factors 69 together, one is the c 


investigator cannot exe 
© Cannot assign the subjects 
to the different Broups 
Partly 


rcise control over independent 
to different groups at random nor 
at random. 

investigator may not. 
the independent and the 
alled post-hoc fallacy, a 
ause and the other is the 


Social Scientific Research 415 


Because there seems to be a close relationship between smoking and cancer, 
de that smoking Is the cause and cancer is the effe 

may be the actual cause of cancer. Such imbalance may induce a certain amount of 
bala ace excessive smoking isa tension-releasing mechanism, such persons would tend to 
tension. 


smokers. In such a situation the cancer can result from glandular imbalance, rather than 
— \ ‘ i 
be aoe king which is merely a symptom. 
from 


spite these limitations, ex post facto research is a popular method of research for 
oe and sociological problems, Such researches are also being conducted in the field 
educall 
of psy¢ hology. 


effect. 


one may 


Cl, whereas in reality glandular 


Survey nena a new technique for social science research, Survey, as such, is quite an old 
Survey wee he largely developed in the eighteenth century. However, in the second part of 
rechnique an seine 4 systematic literature was made available by Booth who is regarded as the 
the nineteenth a ocial surveys (Moser & Kalton 1971 ). But survey research, as a special branch 
father e hci h, is considered as a new technique developed in the twentieth century, 
of ace a arch is based on the simple principle that if one wants to find out what people 
eee cai topic, just ask them. In other words, survey is a structured set of questions or 
think / . jven to people in order to measure their attitude, beliefs, values or tendencies to it. 
ae cece mostly used by psychologists, sociologists and anthropologists, should ‘s 
tinguished from sample survey, which is its close ally, The survey researcher is previa y 
ean | in assessing the characteristics of the whole population, Thus, survey research may be 
esinani preree whereby the researcher studies the whole population with respect to 
defined se wiki sic al and psy chological variables. For example, if a researcher wants to study how 
wire = of both sexes in India adopt contraceptive devices as a measure of birth control, this 
em aes an exniale of survey research, Bul a survey researcher rarely takes pains to make 
we ware to each member of the population oF universe probably because it _— : . i 
time. money and patience. Thus he takes a convenient random sample, which is om = ia 
representative of the whole universe and subsequently, an eine re Hs t ac ihe 
population is drawn. When a researcher lakes a sample from the population 5 a g on 
relative incidence, distribution and relationship of psychological and sociological variables, 
survey is termed as a sample survey. 
Survey research depends upon three important factors. 
|. As survey research deals with the characteristics, attitudes and behaviours of individuals 


or 4 group of individuals called a sample, direct contact with those persons must be established 
by the survey researcher, 


+. The success of survey research depends upon the willingness and the co-operativeness 
of the sample selected tor the study. The people selected for the survey research must be aang 
to give the desired information. In case they are not willing and do not co-operate with the survey 
researcher he should drop the plan in favour of some other technique. 

}. Survey research requires that the researcher be a trained personnel. He must 7 
manipulative skill and research insight. He must possess social intell igence so that he may dea 
with people effectively and be able to extract the desired intormation from them. 


Types of Survey Research 


: ‘lassified into different 
Depending upon the ways of collecting data, survey research can be classified into diffe 
P . 8 P | a Pape ae ! questionnaire panel technique and telephone 
categories, namely, personal interview, mail ques ihel 7 
survey. A detailed discussion of each of them is presented below. 


fethods i" Behavioural Sciences 
Me 


search 
sorements ama AES 
Meds 


ge Fe 
‘aw’ ‘nterview, is one in whic. . 

sonal interview s the survey interview, , ich ad irecy 

sq the respondent 's held with a view to elicit SOME info, ly 

f " vail interview 15 such that the interviewer Neither tries Be ty, 

the latter. The situation © ean He is simply interested in eliciting information na 

respondent nor 0 educ dent) is likely to be one of many from whom similar ing my, 


respondent where he (the respon Kahn (Lindzey & Aronson 1968), there ara 


itt iH) a af | 


be such that the respondent be able to convey it to s Me 


) the; 

ae uired oe ice | € inter: 
that the Nt econ otthe required information must be express ible with ou ee 
In other ae to the respondent. The second is cognition by which f Pay 
any embalt. 


standing on the pall of the respondent i oe ee ie and what he 
understan g ‘red. The respondent must know and understand what types of Fil: ( 
information are pee in what terms of reference he should express the required : orm 
he is ith cae found that the respondent goes off the point and where fic i, 
rene te interviewer's job to teach the respondent his appropriate role so that he ma = 
to the right track. The third is motivation oF the eeponcent The respondent must be Motivate 
give accurale answers because a highly distorted answer Is no better than NO answer at all. Wh. 
the respondent is motivated, he would tend to co-operate with the interviewer Where 
respondent lacks motivation, the interviewer should try to build up those factors, whi 
increase his motivation. 

The success of personal interview is dependent upon the satisfactory fulfilment of these th 
factors. Apart from this, the success of an interview is largely dependent upon the inter : 
personality. An interviewer cannot be regarded as merely a means of extracting in 
personal bias and attitude may affect the required information, It is, therefore, SUBBested that the 
interviewer must be a trained person so that he may be able to ask Probing questions in in 
impartial way and also be able to exhibit a permissive attitude throughout the interview A 
detailed discussion of the interview technique is done in Chapter 12. 


yeas ory 
interview: 4 a 
al in era 


n the interview 


three | 


ule 


INterviewer' 
formation, Hi 


Mail Questionnaire 


Mail questionnaire (or survey} is one of the most common 
educational and sociological researches. As its na 
several items designed to elicit the required 
with the request to return it afte 
be a direct means for obtain| 
of a mail questionnaire, 
First, the mail 
questionnaires in wh 
generally less than the 
second, 
surveyed rapi 
survey @ 


types of survey methods used in 
me implies, the questionnaire consisting 
information is prepared and mailed to the respondent 
r answering all the items. Thus the mail questionnaire appears to 
Ng Information from every respondent, There are several advantages 


questionnaire is less costly than an 
ich postal expenditures take 
cost of interviewing the re 
tans the use of mail questio 
hi ie with less expenditure. 

© Case Of a scattered pPopulat 


interview. The cost of mailing 
the place of interviewers’ expenditures |s 
spondent, 

Nnaires a widely scattered population can be 


Thus, a mail questionnaire is a quick method ol 
ion, 


5 errors, particularly their personal bias 
the validity of the survey, The use of ama 
he interviewers, 


acting the respondent are avoided in the 


"he reliability and 
‘OldS these errors ort 


mail questionnaire 


Social Scientific Research 417 


there are ten principal disadvantages of the mail questionnaire, 


The main problem is the nonresponse from the respondent, Generally, the percentage of 
to a mail questionnaire is very poor, although Scott (1971) has mentioned in his reports 
re mail surveys carried out by the Government that the percentage of response rate was in the 
of five £90. Ina mail questionnaire it is not the loss of sample numbers (due to the nonresponse 
region © sadent) which is serious, rather the probability that the personality characteristics of 
of the resp dent differ from the personality characteristics of the respondents and any inference 
ee el the latter is likely to be automatically biased. There can be two suggestions for 
based — the response rates of mail surveys. First, no awkward and/or embarrassing question 
aja asked because even a single awkward question is likely to produce a high rate of 
sno " oo Second, the mail questionnaire should be accompanied by a stamped 
non erence reply envelope because many respondents do not wish to pay the postal charge 
self-a i reason, may not reply. Besides, a third suggestion may also be made for improving the 
and sad t ie 3 ihe mail questionnaire. if the design and the purpose of the study permit, those 
passa are less educated, less interested and belong to the lower socio-economic status 
| ees caine tuded in the sample because evidences indicate that such persons do not reply 
ial question naire mailed to them. | 
9. Married women, especially those having above-average numbers of children, have a 


higher rate of nonresponse. | - 

3. The mail questionnaire is an ineffective technique of survey where the objective and 
purpose of the survey needs sufficient expraniation for its complete unigertanaing: 

4, The mail questionnaire is an effective technique where the questions to be asked are 
simple and straightforward so that they may be understood with the help of printed instructions, 
Where the questions are difficult, complex and technical, the mail questionnaires 
become useless. | | 

5 When it is desirable to probe the respondents deeply or to talk with them, the mail 
questionnaires serve little purpose. | | 

6. Mail questionnaires are inflexible techniques of survey. In other words, in a mail 
questionnaire there is no way to check the validity of the answers, to clarify the vague answers 
and to know the reasons behind the unwillingness to answer a particular question. 

7. Ina mail questionnaire, the different answers cannot be treated as fully independent 
because the respondent usually reads the whole questionnaire before he starts answering 
the questions, 

8. Ina mail questionnaire, the surveyor has no means by which he can be sure that the 
right person has answered the questions. Sometimes, it has been noticed that the question naire is 
completed by a person other than the right person. According to Scott | 1971), there are two types 
of situations in which this usually happens. The first situation Is one in which the oo 
contains many questions which do not apply to the respondent. In this situation the eae 

lalsely thinks that the questionnaire is not meant for him and theretore, It should be passe an to 
persons who are more appropriate for answering the questions. The second shuation i ri ps 
which the respondent gives little importance to the questionnaire and therefore, thinks that i 
matters little who responds to it. | _ 

9. When spontaneous answers are needed, mail questionnaires are not considered to 
be appra priate. 


|. 
res on 565 


: ns riable 

10. In a mail questionnaire, there is no way to supplement the ae oe 

background data. The researcher cannot observe the mode of expression ee 7 aia a aie 
reaction to typical questions and their general attitude towards the survey. All these observal: 


data are lacking in the case of a mail questionnaire. 





pp Methods in pehavioural Scrences 
1 Research Mente 
aaments anid Rest 
Tosts, Measurentt sayaik | | 
418 Tes i| questionnaire are such that they can be remy Veq 
smple the questionnaire can be mailed anq Colle iby 
& f : at nar C 

“an personally deliver questionnaires to each respong dy 
ewels ¢ f iewers Collect the questigng.:. | Whe 

pleting them, When the aah: , Nile na 

: Her COM : dea) en rinted directions can be | ary 
can mail them au ambiguity of the questions or pr en as solved, Similan? 
router the interviewers May solve the problem o' Wi satay me Te addressee t 
personal delwety *y ; e an interviewer can afford more time and lake more paing 
ants HeCcAaus 
the respondents 


or flat. , 
postman in locating 4 respondent's house 


dvantages of ame 


Fthe disa ! 
some of th rviewing. FOr ex 


are combined with inte | 
the interviewers or interv! 


Panel Technique | . 
Some survey techniques require Succ 


Sirs them where the setter ete 
alae i . once. Where the purpose of the survey is wide and extengsiy 
interviewed IM ; 


nterviews are taken with the same sample. But where the objective of the survey ig 
os si ‘nterviews are sufficient. The panel technique has two advantages. 
an ine a panel technique enables the investigator to wi how the various 
changes through time in the attitudes of the sample Being tu ied. oe 

Second, when the same sample is interviewed twice (or mo re than twice), it becomes 4 ™ 
sensitive and an accurate measure of change than when two different samples from the came 
population are tested, a 

The panel technique has, however, two important limitations. 


essive interviews with the same sample, The 
re-interview design is used and the same sample 

IS 
e, Multinle 
lass Extensive 


factors bri np 


First, in the panel survey there occurs a loss of the sample being studied. The loss may occy 
due to any of the three factors, namely, death, refusal to be reinterviewed, and moving from one 
place to another. The loss of a certain proportion of the sample naturally increases the Probability 
of serious bias in the study, 


Second, re-interviewing sometimes tends to sensitize the sample to the extent that 


individuals refuse to give the desired response and may act as if they are 
of the population to which they belong. 


the 
no longer representatives 


Telephone Survey 


Telephone survey is another form of survey research, In this survey, 
by the investigator on telephone. The investigator calls a re 
questions and records answers. Investigator samples res 
directories, Recently, some computer-assisted 
telephone survey more smooth. Two suc 
Interviewing (CATI) and Interactive Voic 
computer and makes calls, The inte 


the respondent is interviewed 
spondent on telephone, asks 
pondents from the lists or telephone 
technologies have been developed for making 
h popular techniques are: Ci ymputer-Assisted Telephone 
: Kesponse (IVR), In CATI, the interviewer sits in frontofa 
miewer wearing a headset and microphone, reads the 


uestions fro C ; ne 
: er ma computer screen for a specitic respondent who is called. The interviewer 
ecords the answers via keyboard, Once t 


next question on the screen, CATI has rea has been entered, the computer shows the 
interviewer's errors. |p also eliminates the Se ae ot speeding up the interview and reduces 
and speeds up data collection With Vi separate step of entering information into a computer 
Option over the telephone and jes K, @ Tespondent listens to the questions and response 
Fecognition software. VR has ne eee oF recorded through touch-tone entry or voice 
errors and high anonymity, Reseach i . like rapid and automated data collection, few 
simpler surveys but has drop-off ers nave shown that I'VR is relatively successful for shor and 
2002). This lype of survey | io ws longer SuUIVeYS OF Questionnaires (Tourangean et al, 
about the resne vey has the advantage of bein ick sryrsmerisr veers hew-tl wes iene 
Fespondents, But the techn ue h INE QUICK and speedy in collecting inform 

Nown to the respondents the ss | 45 SeVera| disadvantapes, When the investigator is not 
rey usually do not €O-operate and answer only simple and 


Soctel Scientific Research 419 


straightforward eS F8 country like India, the t 
disadvantage: Only a limited section of the population ha 
chance is that not all respondents can be contacte 
telephone survey automatically defeats its purpose. 


elephone survey has one additional 
5 telephone facilities and therefore, the 
don telephone. In such a situation, the 


Advantages and Disadvantages of Survey Research 


survey research is one of the popular methods of research in behavioural sciences, It has some 
advantages and disadvantages. Its major advantages are given below. 

1, Survey research has wide scope. In other words, through survey research a great deal of 
‘nformation can be obtained by studying the larger population, Although conducting a survey 
research is more costly than conducting a field expe 


: aC | riment or a laboratory experiment, still, in 
view of the quality and amount of information rendered by a survey research, it can be taken as 
an economical method of research. 


2 Survey research more accurate. As Kerlinger (1986, 387) has put it, “The accuracy of 
proper’? eis pak ss : frequently surprising, even to experts in the field. A sample of 600 to 
700 individuals or families can give a remarkably accurate portrait of a community, its values, 
attitudes and beliets.” ! 

3. Survey research has been frequently used in almost all the social sciences. Hence, the 
method has interdisciplinary value. In fact, such researches provide raw materials for a vast 
increasing ‘gross disciplinary research’ (Campbell & Katona 1953). 


4. Survey research is considered a very important and indispensable tool for studying 
social attitudes, beliefs, values, etc., with maximal accuracy at the economical rate. 


Despite the advantages, survey research has some disadvantages also as mentioned below. 


1. Survey research remains at the surface and it does not penetrate into the depth of the 


problem being investigated. There are other types of researches which are preferred to survey 
research because they make deeper exploration of relations. 


2. Survey research has some practical problems. Such researches are time-consuming and 
demand a good amount of expenditure. This is true, especially where a large survey is to be 
conducted in which interviews and schedules require skill, time and money, 


3. Although it is true that survey research is accurate, it is still subject to sampling errors, In 
survey research there is always the probability of one chance in a twenty or hundred that an error, 


more serious than minor fluctuations of chance, may occur and distort the validity of the 
result obtained, 


4. It has also been observed that some technique of survey research, such as survey 
interview, makes the respondent too sensitive and, hence, such a technique leaves the 
respondent out of his own social context. This usually invalidates the results of survey research. 

9. Survey research dernands expertise, research knowledge and sophistication on the part 
of the researcher. In other words, the competent researcher must know the technique of 
sampling, questionnaire construction, interviewing, analysis of data and other technical 
know-how of the survey. Most of the survey researchers don’t possess these qualities in the 
required amount, This invalidates the quality of survey research. 

6. Another problem in survey research is concerned with response bias, the most common 
of which is called a social desirability bias, \t has been observed that sometimes people respond 
'0 a survey question in such a way that reflects not how they truly feel or what they really believe 
but how they think they should respond. That is, they tend to create a positive image of 
themselves— one that is socially desirable. 

7. Still another limitation of survey research is concerned with the contents of items 
Contained in the su rvey, Questions may be vague ones which can produce misleading results. 


20 Ge here es 
f febeatoniral 
je cartel jeemereare hi aperbods in 
fi opi 
F Feu, Afecantarerriet 
4 fe wearch 16 net methadological but an ethical i. 
weare oe afte a the fives e- 
decisions affecting (he tives of the p, Me 
cle ne. 


with survey FF 
le can bate hurt. 7 


a etre’ 
invey researt hh sor ee 
flawed in any Way pers . yoOpular opti 
| rvey reseale h has heen 4 POPUlAL GPTION Not: o¢ 
meee stionist, an economist anda POlitical we, 
£ 


4. A final problenr 
knew, on the basis of st 
and ifthe survey are 


ithe ape 
poe dinadlvarnlap ; 
jologisl, ane due 


piade 
Despite th : 
paychologiel hut alse fora sor 


Steps in Conducting Survey R 
fhe basic stepe involved in conduct 

Step 1: The first step in survey Fe" cede 
ihe phase where the researt her develops « 


ae Cie fenail, wet 
ue bieredes ler, cher jeden (fl the ype af HEY 7 ag(hons are vrithen hy thie (he Py hee 
i ye rhea at the questions and design layout, {206% Wer log ¢ 
calegane | 


sae ; a larip 
an well ay ror compleleness ariel fivvcal ly, hey are organizes ange bong gee ia f ording Ir) fe 
type ol survey, respondents and the nature oO : ul 5m an ia ui be We prenar) } 
questionnaire, the researt her alia thinks ahead as to re +i aaeiadiaad We hi data 
analysis, He may conduct pilot testing, of the questionnaire WIN Small ser or respondents sirmilay 
to those to be tacluded in final survey. 
4: Hore the researcher plans how to record data, He alive pilot tests the TUES Onnain 
similar to thase in the final survey resear HW the researe her has 
he traing then with the questionnaire, Tey ask respondents in 
were clear and explore Whether their intended meaning 


ily frie 4 


herp 


esearch 

ing survey Feoe 

arch that can appropriately be called as Planning ph, ? 
rument—a Survey CUESTIONT Ae oe ine ig 

ery ley, lelephone, ele J, decides Onl fe 


arch can be divided into lollowiny six stp 


Step 
with a omnall set of respondents 
decided ta use the: inter iiewert, 
the pilot study whether the questions 
win aloo clear, 

Step 3: At this stage, te survey Peter her decides on the target population ane selactsa 
aarrypele fron, le aloo decides about the number ol respondents to be included inthe sample, 


Step 4) Here the resear her locates respondents, Fle uses ihe clesitoad tools of surveey vi 
trill eu Or irialie, Wer hea, heleghane, vic. ble carelully Collects data with the heelys of the 
selected Loal, byvery atlernpt is rade to keep the allection of data tree trac tinas Chae ray alfeet 
Ihe valiclity atthe teaearch, 

Z cae oF Al wi dlape, the collected data are analyzed, Before analysis, all date are « heoal 
VOWS DADTTY COnnpiiten anialynls parc ilo a better aod qoick cess od Arialyais 

recorded! lati, OT analysis of the 

' bsg fi This io the final stage where findings are dinctimed and reported in proper format, 
His, The Tindinges ane presented toathers low their critical evaluation 


Document or Content Analysis 


Iliin meth iil erie hi ni lien) (line diunercl in ri ianpater l j 


The Cane Study 


He Came stuely is one ot the HOLA Lyin 
atily refers loan in Hopil atidy a 
PVOTT (Cede A bla PON) How aK 
really Hiflorent (rary Te 
ONY Independent Variables 
Wrpelylnnys pepe (iver, fi 
Heh ivalappoach 


rl NOM ertinierital cap chen riplive research, Cane 
One avalon or cas which may be one subject, group OF 
| ann Py t ually Ue Caee is P prenociy EM a ane atudy is 
" amin bain the Lorre does not iiwelye: rian ipulation 
abhiidlnal es rare herd Who ittinguinh belween a cane atucly, 

PPO and 6 ing hhintewy, Wrap aly drips a fetroapeetive, 


IT wer Weaave (lie Nintory ol the 
Ch Ty lial, few the 
Sti tion ol darily Vite. | 
Pere Hater holy Wi dibs 


Case _ 
Firat Unive tins - oo " becoiniws obvious that Prederek Le (lay 
lowe perie en iti NG Mathod inte we lal atc ieee research in hin 
CUMOMNAD TG sted: Ww HANS oo lolagiAL N20 1900 4) wie the first to une 

4 When i loaly, op roye Vilateist foot Uhver Cheat herve acleyitod 


stud 


Soot Setentifie Research 421 


study method is his work with juvenile delinquents. Sigmund Freud also used the case 
y method in the field of paye hiatry, especially in his effort to treat his psychoneurotic patients, 
‘amous case history ot serge! Petrov, ‘The Woll Man’ published in 1914 under the ttle From 
y ofan Infantile Neurosis is one of the best classic examples of his use of case study. 
Freud, these case studies contirmed his hypothesis which provided a base to 

ue hoanalysis.as 4 method of treatment, The case study method is also employed by historians, 
Pe hropologists. ethnologists and psychologists. Historians have used the case study method for 
. oviding descriptive accounts of persons, eras and nations, Anthropologists and ethnolopists 
ysedd the case study method tor making a detailed description and conparison of primitive 
and modern culture. In the field of psycholopy, exhaustive case studies have been dane 
by Murray and his associates at the Harvard Psychological Clinic. Their case studies included 
il mrieacdees such as interviews, conferences, conversations, dramatic productions, lests ol 
4hilities. ial reactions to frustration, imaginal productions, etc. There are some evidences 
which dernonstrate that even novelists and wartime correspondents tend to sketch characters of 
persons aod families by using the case study method, 

fhe purpose of the case study method is to understand the important aspects of the life cycle 

gf the unit. In fact, sue ho study deeply analyzes and interprets the interactions between the 
different factors rhat influence the Change or growth of the unit, Thus, itis basically a longitudinal 
wh which studies the unit over a period of tine, A review of literature in this field reveals 
that case stuclies are not confined to the study of individuals and their important behavioural 
chard teristics, rather, case studies have been made of all types of communities and individuals, 
Whatever the type of individual or cormmunity is, the element of typicalness, rather than 
uniqueness, 18 the focus of attention in the case study, Therefore, a case of the case study, as 
Bromiley | 1946) has suggested, is net only about a ‘person bul involves a cCalegrory of 
andividuals’. In the light of this suggestion, the selection of the subjects for the case study needs 
to be done very carefully so that it may be assured that he or she is the typical of those to whom 
ohre penerali zation is to be made (Beat & Kahin 1992), 
case study, date are gathered through several methods or techniques, Some ol the 


rie? case 


phe Histor 
Act ording ta 


rial war t 
culture 


ST 


aapapore 


ri 
inp ian Orin re a6 follows: 
4) Observation of behaviour characteriaties, and social qualities ofthe unit by the researcher 


(ii) Ldser cal questionnaires, OPINTIONMAMGS, Inventories, checklists und) other 


poycholopical tests 


ot recorded data fram newspapers, s¢ heals, clinies, courts of other 


(in) Analyois 
sierihanr sources 
liv) Interviewing the subjects, their (riends and relatives, and others 
Fron the aforesaid discussion, the following major features of The ¢ ane study can be isebatedd: 
(i) The case study is an approach which views a socialunit asa while, 
Hi) The socialunit need not bean individual only but it may be a family, a social proup, a 


SOOM ST hutiOn Of a Community 


(i tn case study, the unitary character of the sacl unit ts roaintained. Homeans that the 


nocialunul, whatever itis, studied asa whole, 
(iv) Trheeane atudly, the researcher tends to study the aspects ol Whar and Why! of the soc tal 
tl lov cathier wereds, Nueces Uhver ferkarsare bier cel curly Iriees Wt eompalenin thers cirriple belied phinaal peatlern 


ol the social anit bot aloo trios ta lacute those laclom whith Dave given rive to sat hi connpalen 


lierhiavicninal paitlerry| 
lv) Since case etuely ina cles riptiver ferseare ee mianipubabedd bere, 

rors data tatially Through methods of observation, 

als, Arvalyals at rocorded clita 
iors de OD Orion 


(Vil ti cane study the researcher gall 
INTE View, quiestionnaife, opintoriniaite and oiler psychological te 
Nanny Newspapers, ¢nngrls BOVE OenT Apenit tes ancbother similar sou 





visale, comely ents Weal evo types te ily 
i iat a P rl t cons; 
| number dividual case study, Ine SOCIS! UNIT Consigg 
Based upon the mut” case study. in individu ai. it emphasizes analysis in dene, SF Orig 
study, and ine NT as is only one individual, is to be tested tu. me 
individual or person. Since EET tgping some hypothesis 10 i Dut it is eg 
tt teal case study may be funn individual cas€ study is 2 time-honoured pr : 
_ ing broad generalizations. » hes. The community Case study is one in which, 
useful in mak ical researc : aes the 
: ne and med Such case stud 
in the field of medicine ae a family OF 2 social group. y isa thorouss 
social unit is not a person, aa, le who are living together in a particular geo, 
peervation and analysis of 2 group Of PEP 
territory. The community case ee onle : ni 
such as location, prevailing ag health education, religious Expression, recre ss: 
ia} structure, life values, Nea! ’ been reported by L 
development, soc The early community studies have po ynd and [, ynd 
impact of outside world, etc. is to Cayton (1945), Warner and Lunt (1941). Recently 
(1929, 1937), West (1945), Dra e lescents in the small Illinois Community of E| ui 

; ad (1949) studied the life of adolesce Mtown's 
Hollingshead (1 4 red the way of social life in the three Canadian COMMUNIties jn 
Youth. Lucas oh See poo a Life in Canadian Communities of Single Industry. 7 
Minetown, Milltown, Kalllown. . is ; ; 

On the basis of the purpose, a case study may be subdiv ide ns two categori jiant 
Case analysis and isolated clinical case analysis. In deviant case analysis, the researcher start. 
with a difference already found between two persons or groups of persons and his task is to read 
backward to deduce the condition that might have produced the difference (Warwick @ 
Osherson 1973). In isolated clinical case analysis, the emphasis is on the individual units with — 
respect to some analytical problem. Such a study has been popular in psychoanalysis, Freud's 
study of litthe Hans is a well-known case which can be cited as an example of isolated Clinical 
case analysis. Freud's theories of psychoneurosis were formulated through accumulation of 
many isolated clinical case studies of individuals. | 

The case study has some advantages and limitations. Some of 
as follows: 


deal with different elements of the com = 
ctivity, climate and natural resources, res 


= 


the major advantages are % 


(i) The case study is a mode of organizing data in terms of sorne chosen unit such as the 
persons’s life history, the history of a group or society, and sore delimited sacial processes, This 
has the advantage of intensive study of the social unit. q 


(ii) According to Goode and Hatt (1981), the case study method provides sufficient basal 
facts for developing a suitable hypothesis regarding the social unit being studied. This is possible 
because of the in-depth analysis of the concerned social unit. 


iii) In case study, the researcher pets sufficient fact for 


aclt making a Comparison between two 
similar social units. 


liv) Goode and Hatt (1981) are of the Opinion that th 
careful examination of all those relevant fact 


an apinionnaire or any psychological test is 


rine these advantages, the case study has some limitations as stated below. 

researcher meee: Hiei the case sludy is the response of the researcher itself. Here the 
1981). The GadesHlantes ~_ Hire of certainty about the conclusions arrived at (Goode & Hatt 
of research design as aici pe feeling of certainty are the temptations to ignore the principles 
underlying the analysis of se 2 @ failure to make explicit just what are the generalizations 
of drug addiction from ‘ities coun pit suppose a researcher has collected 250 cases 
does have an adequate sample, no ‘eater ‘ai he will have a strong temptation to feel that he 
Likewise, he may be in a difficult posit; w much knowledge he has about sampling design. 
conflicting trends of behavi Position to be ex 


licit about the ge =i f the 
our of the sample Sake ut the generalization because of th 


€ Case study provides opportunity for 4 
s and data on the basis of which a questionnaire or 
to be developed. 








Secuad Scwercefc Ressarct 423 


(ii) The case study method looks deceptively simple. For efiectrve and Drone use of the case 
arudy method, it is eserunee that the researcher must be thoroughly iarmliar wah ousting 
theoretical knowledge ot tne concemed held and must be chillaul in Bolaung the mrportant 
variables from many Offers that are unumportart and wretevant. 

iii) The case study ts a costly method in terms of time and money. & is well known that each 
case becomes a research in itself and the collection of even 30 cases may Consume at least a year. 
“jot only that, there may be a loss of potential cases from the sample drawn due to such a long 

iod of time. Since each Case is independently analvzed. a not only consumes time but also a 
creat deal of money of the research fund. 
© iy) The subjective bias of the researcher is a constant threat to objective data-gathering m 
case study. As a consequence, the conclusion loses its dependability and the validity of the study 
becomes questionable. 

(v) The case study is susceptible to what is called post hoc fallacy when the efiects are 
wrongly attributed to factors that are simply associated. In other words, cause and efiect 
relationship is not established in the case study method. | 

(vi) As longitudinal studies, case studies are confounded by occurrence of many factors anc 
as archival studies, they may provide data with poor reliability and walidity- | 

Despite these limitations, the case study method is a useful method of organizing research 
observations in social sciences. 


hnograph Vic Studies 
ial is such a nonexperimental or descriptive research which became anor oe 
latter part of the nineteenth century. It is sometimes known as cultural anthropo Oey = 
recently as naturalistic inquiry. Ethnographic study is a method of held observation 
observation of behaviour in natural setting. Originally, it consisted oi participant CEO: 
conversation and the use of informants to study the cultural and social characteristics of wna? 
people. Nowadays, such observation and conversation have been extended to a pea “i 
different social groups also. In the beginning ethnographic study was confined to sue heen 
people like African, South Sea Island and American Indian tribes whose numbers nee iow ait 
who were geographically and culturally isolated. Major emphasis in such : stu aA = aa oe 
language analysis, marriage, child-rearing practices, religious betiels an ao ic - rer 
relations, political institutions, etc. In ethnographic study the researchers potot Lam a! 
tribe and data is collected through observation of patterns of action, verbal as well as nonvert 
interactions between members of the tribe as well as the researchers and his or ner LICENSE. 
For successful and effective conduct of the ethnographic study, the following three 
supgestions have been given: ; 
~ li) The perbeests should personally go to the people of the tribe and live for a long period 
of time and become an integrated member of the social group. ; _ 
(ii) The researcher should have the skill to interpret observation in terms aie ive : 
concepts, feelings and values while at the same time supplementing his - i ime ie + ae 
making an objective interpretation of observation. He should also have tea 
language of the tribe in order to have better adjustment with peapie: _ iis 
(iii) The researcher should be trained or at least he should train his informants to recor 
field data in their own language and cultural perspectives. — 
The ethnographic study has the following two fundamental earn — 
(i) The most important behaviour of the individuals in a Eee onset ne sae 
process of complex interactions and consists of more than a set of discrete inc 





wal Ae HOCUS 
iehuat nnd 
Resi h jorbods inf 
: aha a Vi 
rere fords gal : : 
: . . | Huence by the se lun 1 whic ly mere CUTS, Ihe 
\ in y 


: MG e 
etting and the nature of the social SUUCLUre. thy 
and | ¢ study, Coming of Age in Samoa, , 


aphi tu 

ec ethnographt , | ba | 

er classic ee sociely and compared the Sdime Wi In the 
rls in > 


ywioul is direct! 


(i), Human beh inderstand that's 


“e 4 
lore, must KNOV 


her 
ue 28) in h 


Mead (14 


mee th 
cent gl ic differences in the physic. a gy 
lopment of 53 eT : there were no bast a to calnaal site : °S5e oy 
eve lol io reported thal t ican girls. Due to Cuttural restrainp. 
i ae girls. She reported aks and American girls ; Ins, Amerie 
4 nlescent erowth between 2am during this period. *y 
al | es ' 


ifficullies 

ly more diffict ico been conducted over some educational ISSUGS the 

dix) UES i : ; 
o tribe, The study conducted by Morris et al. ‘ggqh 
aes imitive -* , ws ss te 
hifting its emphasis from the prim is were observed. In fact, the researchers were INteresteg 
campte. inthis study, school principals ime is spent in those activities Each nin of 
example. In this st «inate do and how much time Is sp ana | eae ee "TINCipay 
determining what principals Je The researchers followed the pr ncipal Wherever he Went 3 | 
abserved for 12 full workdays. Ne Feskeh" | interacted with and by what Means +4 M 
wast sie ed down whom the principa : h like The wer | that is, 
eh face toace, by telephone, by written word, and t nether spontaneous a ag 
Ne voice which consumed the principal's time vuln Planned 
the topics Ww bic = thé conversation. Morris and his colleagues reported in their CONCI yg on 
psec teen ene I t less than half their workday in offices and that they often aCted jp 
that the principals ae Ney ctiste isla. Above all, the principals’ behaviour direct 

4 discretionary style in their cec | its ers in the co 
affected four major areas: weschers and students, parents and oth mmunity, SUperiog 
and the principals themselves. a 
, . The important ; 
Ethnographic study has some advantages and limitations. The important advantages are 

mentioned below. I behaa 

(i) Ethnographic study is conducted in real-life setti ng and —— ete 
The researcher gets inside the minds of the people while at the same time inte 
behaviour. As such, the facts, got in conclusion, are more dependable and reliable. 


i) The external validity of ethnographic study. is generally high. That is why its 
generalization is valid and sound. 


girls had to face relative a P 
"Recently, ethnographic study M4: 


iS Observed. 
rpreting the 


i) The ethnographic study is free from the constraints of more conventional research 
procedures, 

The important limitations of ethnographic study are as 

1) For effective conduct of 
objective participant observation. 


to Maintain the position oj neutrality and are overwhelmed by the strong 
the subjects. This defeats the basic purpose of the study and 
of the study, 


follows: 


neutrality is essential to 
rs or their informants fail 
feeling and emotions of 
invalidates the objective conclusion 


MN The study requires much tim 
vé to live with the people and/or o 
MH) Such a study requires that 
interpreting Observations jn terms o 
HME supplementing his own abje 
such qualifications, the researche 
Despite 
behavioural 


, € and patience on the part of the 


bserve their behaviour in a real setting, 
the researcher Must be a trained personne! 
the tribe’s conce 

Clive judgement jn ir 
T'S Considered not fj 
thnographic Study js 


researchers because they 


and capable of 
pts, feelings and values while at the same 


Nerpreting observations. In the absence of 
t for observation. 


considered a popular method of research in 


these limitations, e 
SCIENCES. 


DIFFERENCE BETWEEN RE 


The difference between the 
Methodolog y Must 
methods and tee 


SEARCH METHOD A 


) WO related te 
be Clearly uNnderstoad and t 
hniques wh ICH are Used 


ND RESEARCH METHODOLOGY 

ms, that js, Kesearch method and Research 
. Xplained, BY research method, we mean all those 
" Conducting research. In other words, research 


Social Scloniifie Rewarch 425 


fer to all those methods which are used by the researchers in conducting the research. 
ae ri ‘ - ’ tk : 
methods - us research methods used in ¢ Onducting a research can he put into the following 
varia. ’ 


proadly, ans 
three catego" fe t category are those research methods that are concerned with the collection of 
uP e emamied are generally used when the data already available is not sufficient 
ni to arrive at the required solution, ofviiiece 
7 _ econd category Consists of those statistical techniques which are used for finding « 
(ii) The s alin between the data and the unknowns. 
rik category consists of all those methods that are cornmonly used to evaluate the 
(iit) bist of the results obtained. | _ cba te 
i fh Methodology may be understood as the science of studying how the hae 
Resear inly a way to systematically solve research problem. For the smooth con : 
conducted. It ts ma ntal that the researcher must know not only the research methods “ : 7 
of Ty ‘whieh obviously contains the various steps that < a ps pests 
met with a suitable rationale. ae See 
Seantel in studying ee ree only know how to develop certain tests 
methodology Ns ah oO brow how to calculate various statistics like mean, standard — ay 
tobe eta ‘: ti etc. He should also know which of the various techniques are re pet 
Se ay aha are the major assumptions underlying the _ seindecer Woes 
and which ; d precautions needed in implementation of t CHE DEC IINIEHES. we chen 
used, and the dangers oree hy a research problem has been undertaken in what Waly SE EEE 
methodology also species why ave rticular statistical technique for analyzing data has 
the hypothesis has ee <a a should design this methodology og 
been adapted, ae ee d beeaice recsarch methodology will differ trom problem to pro , . 
— eee i ceranaliesi between research methods and the research methodology revea 
Now a 


wing features: , fiat thods. In this 
the iatle 5 ope of the research methodology is wider than that of the <ache Sat crete 
si i ne talk of the research methodology, we not only talk “i io explain why we 
io also consider the rationale behind the methods to be ned a ae aie aie ihe 
are using a particular method/technique and why we are 
method/technique. | | oe ésimphyone 
(ii) Research methodology contains many dimensions and research methods a ; 
i) Research | 
of such dimensions. | soe ees | ae eonaItat 
(iii) Research methodology is broadly a science in itself but research m 
iil | me bY ee 
simply one aspect of this broad subject. 


ETHICAL PROBLEMS IN RESEARCH icularly in the field of psychology, sociology 
ical problem in behavioural researches, particularly in the field of p ts mostly on human 
The ethica ogee | one because the researchers here conduct experimen iment. To solve 
and education, is a vita i massed irightened or hurt by the nature of the ee a hs eae 
eae ee nay ie aa a encies like the American Eeycholagiea 4 sparen (ASPCA) 
aes Depot aha nynntean Society for the Erevention 6! cel wie for research with 
se le am Office of Education oe Sie es sa welcomed all over the 
one. a clei eee! in animal researches are enumerated below: 
orld, . 


fully possessed. 
1. All animals used in experimentation soc a at to animals subjected to 
made void bodily a | der the 
tBu fort should be made to a ‘s ial, must be done una 
i si agi arches, where discomfort to animals is essent 
experimentation, Researches, 


proper care of veterinary experts, 


ay ea) eee 


ing the experimentation with due, 
. juring the experimenta ie 
3. All animals must be fed propery sii 
‘i ; i Tr n . 
sanitary environment, — Ili iia ao ne 
mpg sin : rations are to be done for fulfilling ine PUTPOSe of th 
4. Where oe eciable general anaesthesia or local anaesthesia. Allempt 
done under s | nerimentation. 
Li sskichderr enti throughout experime itive care in a hy 
cori imals must be provided with post-operative care nies Mea hg, 
5, Teens s esearch contiibuiss to the advancement ot knowledge ; 
AS WE MON Pat ECeeorEN is essential that human beings fully Participate and ¢,_ ta 
ent of human beings, wis es ; “0-06 
meal her but not at the cost of human rights. Berg (1954) hac “NUN Ciateg bs 
sche ain : of ethical research with human beings. These elem 
important elements ot 


©ONsid 0 
rat 0 
Nh 
It 
shoy ld i 


and Ultin: 


Ents are 
| | | iON 164 “= COn 
fidence and standard procedure. The first ethical consideration is that the Person en 
are to ” as a subject. He must give this consent after knowing the ric Blve 


discomfort involved in the proposed research. In case the Person 's Mentally ill but IS requ: if 
act as subject, consent must be obtained from the patient’s physician or the Patient's Buardian 
The second ethical consideration, as Berg spells out, is confidence. The human beings 
act as the subjects, must have confidence in the experimenter in that the latter will Not m 
attitudes, reactions, and opinions public, It is a natural desire of human beings that othe 
not know their attitude, opinions and/or performance on a certain task and 
essential that a human subject be given the confidence that others Will not be al 
his performances either through the experimenter or through his publication, The 
ethical consideration is that of the standard of acceptable Practice. In other 
that is being followed by the experimenter must be a standard procedure, B 
we mean that the procedure, must be “tried 
1954). The practice of following stan 
experimenter. Suppose the experime 


therefore it is 
lowed to 


knoy, 
third im 


dard procedure May sometimes pose a new proble: 
nter is trying out a novel and original procedure. | 
situation, the principle of following standard procedure may not be fully satisfied, However, jt 
May be argued in case of a novel and original procedure, which is not standard, that it must be 
regarded as acceptable by others undertaking the research for the welfare of the sy bjects. 
the precautions that 


- a researcher must have regarding human subjects may be enume 
lollows: 


Some of 
rated as 

|. Where the Possibilities 

subjects in the begi 


10 participate, 


: lor injurious after-effects exist, the researcher should 
nning and the research sho 


inform the 
uld be started only whe 


N the subjects agree 
_ 4. When the design of 


- e the experiment jc SU 
“ely f0 Cause an injurious eff 


ch that some intoxicating drugs 
oA Ect) are to be ad 
Maintained for appropriate 


(which are 
ministered to Subjects, adequate facilities must be 
€ physical and the Psychological health of the 
d under the Suldance of medical experts, 


afeguard the right to Privacy of the subjects. Every subject has a 


Nation about himself. The use of questionnaire is common in 
HEN, Items of the questionnaire 
€ not ordinari| 


i relate to religious feelings and personal 
subjects, the researcher shou! : ise by the subjects, In order to maintain the 
| : Ould (a) assure the Subjects of i 
and (b) try his best not to put Unnecessary and not mal 
4. The researche 
this right is meant oe ened the right of the subjects to remain anonymous. BY 
For not invading this right of th sib Jects should not form the subject of research. 
name but by the numbers assigned ork. catcher May (a) identify the subjects not by thelr 
corre May (b) poo| the data obtained by a group and report 


€ administere 
| The researcher should « 
Nght not to disclose Certain info 
behavioural res 


€arch. Often, 
habits which ar 


Social Se lentific Research 427 
| werdpes. li the AVeTaPeS are reporter Natur 
jne « 


ht to remain anonymous is not invaded, 
win 
Mb 5. The researcher should treat the A 


gubject has a right to insist upon the fact that the data shou! 
ars | should not he exposed to anyone who wants to se 
cher (a) carries the treatment of the dat 
tT protocols when the study is over 
arene information Calises danger Io t 
subpoen aed records. 


ally, the individual data is ignored and then, the 


ala obtainer trorn the subjects confidentially, Every 
d he treated in a confide 
€ it Usy 
a by numbers rather th 
Exception exists in 


he person or 


ntia! manner 
ally, to ensure this right the 
an by names, and (b) destroys 
the circumstances in which 
society as well as cases that have 

6. Theresearcher should use deception wherever jit iS possible and essential, By deception 
is meant temporary withholding of information about the hypothesis of the research and other 
; tails from the persons who participate in jit. In Case af some psychological researches it is 
de : ly not possible to tell subjects everything which they could be told about the study because if 
aes had knowledge about the actual Purpose of the research, they might alter their critical 
sO of behaviour which are being studied by the researchers, Alternatively, it is sometimes 
T ly impossible to study a particular psychological process without deliberately misleading 
he sublecs While the reasoning behind deception is sound, the use of deception also raises 
ae seein ti ethical issues. |s it justified and appropriate for psychologists to withhold 
eae irom research participants or even to mislead them? Although this issue remains 
este a controversial, 


many psychologists believe that deception is permissible provided two 
basic principles are followed. 


. saeiniler ty ining informed consent, that 
The first principle involves obtaining in ed is : 

are going to participate in the research, as much information as possible about the major ila 

aiid procedures a study will involve before they agree to participate in it. such ani 

be coupled with a statement that they are completely free to leave at any time during the stu vi: 


The second principle is called debriefing which means that the subjects should a 
. ae ae : arch study, including deception, after they ha 

information about all aspects of the researc Y, 7 a oo ies 
sanitipatnd in it. The purpose of debriefing is to help the subjects leave the roe Neon 
lish understanding of its purposes and feeling at least as good as they were at t . le 
research (Baron 2002). Latest evidences suggest that these two principtes, hee he nei 
consent and debriefing, go along way towards eliminating the adverse effect 0 wei sence 
1994: Sharpe, Adair & Roese 1992). Still, exposure to oe rare Pare 

J c yo u | F tt ‘ 
Pulte tn shat researchers tell them (Epley & Hu 
increased feelings of suspicion about w | eae —— 

Obeying 7 important ethical standards while conducting researches is considere 

eyir : | 


ical standards may unishment by a 
mandatory for psychologists. Violation of these ethical standards may attract p y 
court of law, 


is, providing the subjects who 


| FE | RESEARCH 
COMPARISON BETWEEN EXPERIMENTAL AND NONEXPERIMENTAL 


imental 
Bats cas search and nonexperimen 

Social scientific research takes basically two eos eneme hal HE a 
. 7 > the researcl 

: siepn search is one where the rese * ae. 
research. An experimental researc ssa si sinenerirnental (eeawe 
Examines Eva tpt under possible controlled a tei crim because either their 
investigator does not have direct control over sic - manipulable: Whether it is 
manipulations are already there or they are nas i tries to verity the postulated 

ck} : a : - f the rese: | | ie 

a: sr beg " mental résearcn, ite this imilarity, the 
experimental research or nonexper! ones. Despite this 
lotionslin between independent variables and wa ibioel P 

t | : . s 
; : : ee ee ee ween the two: 
following are the major points of distinction bet 





428 Tests, Measurements and research Methods in Behavioural Sciences 
1. In experimental research, the effect of eck conan oe independent Variables Upon 
dependent variables is examined. In other words, here ng a as a Introduces some 
changes or manipulations and subsequently, examines its effect. But in nonexperimenta| 
vesearch, the situation is reversed. Here the effect is before the investigator a nd on the basis of that 
effect, he infers about the possible causes. In other words, in nonexperimental research the 
investigator, on the basis of dependent variables, infers about the possible independent Variables 
2. In experimental research, the researcher has direct control over the independent 
variables. Whatever the type of manipulation he wants to introduce, he can do so readily and 
easily. In nonexperimental research the investigator has no direct control over the independent 
variables. As a consequence, he can’t introduce changes of a desired sort in independeny 
variables. The major reason of noncontrol over independent variables is that in nonexperimenta| 
research, the independent variables have already been expressed. 

3. In experimental research sufficient degree of internal validity is found, whereas in 
nonexperimental research the level of internal validity is amazingly poor. 

4. In experimental research, the investigator can easily assign the subjects randomly into 
different conditions. Not only this, the various groups of subjects can also be randomly assigned 
to different treatment conditions. But in nonexperimental research, random assignment of 
subjects into different treatment conditions does not carry much weight. 

5. It is said that in experimental research when an event is brought into the laboratory or in 
some controlled situation, the nature of the event is changed and becomes different from those 
which occur in an everyday realistic setting. But in nonexperimental research, events don't 
change because they are studied in everyday realistic settings. 

Thus, experimental research is different from nonexperimental research in many respects, 


TYPES OF EXPERIMENT 
In social science researches various types of experiments are undertaken, keeping in view the 
purpose of the experiment. These experiments can be mainly of four types as noted below, 

1. Exploratory experiment: Exploratory experiment is one which is conducted in a situation 
in which little knowledge to formulate a possible solution to the problem exists, As a 
consequence, poorly framed hypotheses (tentative solution) are the only guides for the study. In 
such experiments, there is no scientific basis for predicting the effect of the independent variable 
on the dependent variables. Thus, exploratory experiments are broadly conducted at the 
preliminary stage of investigation and its findings can be the basis for formulating a specific 
precise hypothesis for further study. For example, if the experimenter observes that most of the 
hungry subjects learn the given list of nonsense syllables sooner, he may come to the conclusion 
that hunger facilitates learning. Such findings may be the basis for formulating research 
hypothesis in future. It is thus obvious that exploratory experiments are conducted mainly to 
discover whether or not the independent variables are influential in affecting the given 
dependent variable, According to McGuigan (1990), a common descriptive term for exploratory 
experiment is “l-wonder-what-would-happen-if-|-did-this.” 

is Contirmatory experiment: Confirmatory experiment is one in which an explicit 
; eee ate various types of tests and is generally confirmed. According to 
WL Med en a monies eee for a confirmatory experiments is: 
bituilaes. sie aa re He is. : such experiments, the experimenter Clearly 
diiditiessunpees ti ings 0 the experiment are used to test this hypothesis. If the 

Pport or contirm the hypothesis, the experimenter accepts the hypothesis as true. If the 


hypothesis is not in accordance with the findings, it is modified to fit the data better and then 
tested by a new experiment. 





. . Soctal Scientific Research 429 
3. Crucial experiment: A crucial ex 
ypotheses simultaneously. Ideally, a ¢ 5 One that intends to test all possible 


rucial exe 

3 4 3 . Periment is ar ca 
support one hypothesis and rejects all Possible alternative ites on whose findings 
5. 2INCe in any research 


re inarily, it is not possi 
roblem, ordinart Ye G possible for the px 1 
i ait er : 
hypotheses, therefore, the investigator can’t : pi ie to state all possible alternative 
the concept of crucial experiment is important as an id i 2 Irue crucial experiment, However, 
chould be directed to achieve that ideal, a! and the accepted view is that efforts 
4, Pilot experiment: Pilot exper; 
iment hat an experi eg ae Known 
experimen a r - Periment conducted prior 
experiment is a sort of dress rehearsal of the Mralorensas 
lot experiment is conducted with a small while lee M shisha oie 
al ai | | i JECIS. tt tries to find out the fitness 
procedure being adopted, Supgest the values to be assigned to the variables und ies 4 ihe 
locate the likely mistakes that might be made in conducting th under study, and 
students should clearly bear in mind that despite the word lho be actual epee later on. 
: ia a: con ne eing generally associated with 
‘aircraft’, ‘pilot experiment’ has nothing to do with the behaviour of erst ene: as som 
students may think so. P iis 
A pilot study may im ae or complex. Its degree of complexity rests primarily on the 
standards, interests and budget of the researcher. It should also be noted that a pilot study may 
make or break a future research plan of larger scope. In view of what is revealed in the pilot study 
a researcher may decide that little or nothing is to be gained through the implementation of a 
larger study. Of course, sometimes pilot studies show en couraging results and more concrete and 
sophisticated research plans are implemented subsequently. | 
There are some specitic functions of a pilot study. Some of the important ones are as under: 
(i) Pilot studies help in developing better approaches to target populations, 
(ii) Pilot studies help the researcher to determine whether or not a more substantial 
investigation of the same phenomenon is warranted, 
(iti) Pilot studies help to discover and ameliorate some problems associated with interviews, 
questionnaires, and the like. 
(iv) Pilot studies help researchers to develop meaningful methods of categorizing the data to 
be collected. 
These are the basic types of experiments usually undertaken in social scientific research. Of 
these experiments, confirmatory experiments are more important than the other types 
i p t lirmatory exp t portant than the other typ 


Js pilot study, is a preliminary 
to the major experiment, Broadly, pilot 


of experiments. 


TYPES OF APPLIED RESEARCH 

As we know, applied research is a research designed to offer practical solutions to a concrete 
problem or address the immediate and specific needs of the society or organizations. There are 
three major types of applied research: Evaluation research, Action research and Social impact 
assessment research. A discussion on these is given below. 


1. Evaluation Research 

Evaluation research is the most popular type of applied research. Rutman (1977) used the term 
evaluation research to describe the evaluation procedures, which used rigorous research 
methodology. In this type of research the researcher tries to determine how well a programme or 


iy ; a ere - and obiectives C tly. Such research tends 
policy is work factorily or reaching its goals and objectives correc : 
pony ie working sats Y ee words, it aims to assess the 


to measure the effectiveness of way of doing something. In other s (Punch 2004). Evaluation 
effectiveness of different actions in meeting needs or solving ines, : ‘ He pee 
researchers use several research techniques (such as survey and He 0 5 


programme, 


430 fests Meastirements and Research Methods in Behavioural Sciences 
1 ms, . ik 7 : : 
common types of evaluation resea rch: formative evaluation research and 
rive evaluation research. In formative evaluation research, there is a built-in mon itoring oF 
mativ | , we ay pas raries Fieal luat 
sum ous feedback regarding the different aspects of programme being evaluated. Asummative 
continuous fe re the researcher looks for final programme outcomes. 


i is one whe 
evaluation research Is 0 usdege wre | 
Evaluation research is the part of administration of many organizations such as schools, 


olleges, government agencies, businesses, €'c- This type of r esearch greatly expanded in the 
rene States of America when many New federal social programmes were Created, 


960s in the United 7 
* other countries including India, such research is becoming popular day-by-day. 


Sometimes ethical and political conflicts often arise in evaluation research because people 
tend to exhibit opposing interest in the findings about a programme. Rata of evaluation 
research may affect getting a job, building the popular ity of a political party, etc. It has been 
observed that people who are personally displeased with the results of evaluation research often 
attack the researcher’s methods by claiming them to be biased, sloppy, inadequate, etc. Besidas 
these, sometimes evaluation researchers are pressurized to rig a study before they start work. 

Evaluation research has some limitations, besides the said difficulties mentioned above. 
First, raw data are rarely publicly available. Second, the organization can selectively use oy 
ignore reports of the evaluation research. Third, in evaluation reports the focus is narrowed to 
select inputs and outputs more than the full process by which the programme can affect people's 


There are two 


lives. 


2. Action Research | : 
The term action research was first coined by Kurt Lewin in 1944. The fields of social psychology, 


industrial/organizational psychology, education have shown interest in what has been called as 
action research. Action research is one where focus is given on immediate application rather than 
an development of theory or on generalization of applications. Here researchers give emphasis 
upon those problems, which originate in local setting. In fact, its findings are evaluated in terms 
of local applicability and not in terms of universal validity. Reality is that action research follows 
two traditions: The British tradition tends to view action research as simply a means of 
improvement and advancement of practice (Caar & Kemmis 1986) and in the American tradition 
it is viewed as a systematic collection of data that in a way, provides for social change (Bogdan & 
Biklen 2007). 

An action researcher treats knowledge as a form of power and tends to abolish the line 
between research and social action. Such researcher assumes that knowledge develops from 
experience. He also assumes that people can become aware of conditions and learn to take 
actions for bringing about improvement. 

There can be different types of action research and most of them share common 
characteristics. Some of the common characteristics are as under; 


(i) Action research incorporates ordinary or popular knowledge. 

(ii) The action researcher tends to raise consciousness or increases awareness by expanding 
public awareness. 

(iil) Those who are studied, participate in the research process. 

(iv) Action research tends to focus on power with a goal of empowerment. Such researchers 
try to equalize power relations between research participants and themselves and avoid 
having more control, status and authority than the participant. 

(v) Action research is tied directly to the actions. 

(vi) Since in action research the goal is to improve the conditions and lives of the 
participants, publishing it in terms of articles and books becomes a secondary matter. 


Social Scientifie Research 431 


a nutshell, in action research the participants tend to assume an active role in formulati 
gisigning and carrying vs a as well as the researcher also ry lo Cogenerate icianledgan 
et jaborative processes t ; continuously incorporate the diverse experience of local groups. 

Keeping the above facts in view, Kemmis and McTaggart (2003) have defined action 
arch as collective selt-reflective inquiry undertaken by the participants in various social 
s for improving produ ctivity, rationality and justice of their own practices as well as their 
understanding of these Prachees satisfactori ly. Thus action research has a philosophical element 
(concerned with the role of doing or achng in knowing), a political element (where focus is on the 

plitical aspects of knowledge production) and a critical social science element (where the focus 
son the em powerment of the participants who are the subject of the research), 


In 


rese 
situation 


3, Social Impact Assessment Research 

Atype of action research that is more popular in social psychology and sociology, documents the 
likely consequences for various areas of social life if any major change is introduced in the 
-ommunity (Neuman 2006), Such research aims at evaluating the likely consequences of a 
planned change. Researchers conducting social impact assessment research examine many 
outcomes and often work in interdisciplinary spirit. Seven principal areas are assessed by social 
impact assessment research. 

Community service {such as school enrolments, police responses towards public 
complaints) 

Economic impact (such as changes in levels of income, failure of profit rate, etc.) 

Social conditions (such as crime rates, elderly people's ability for caring for themselves, 


etc.) 
Environment (such as changes in noise levels, pollution rate, etc.) 


Demographic consequences (such as population movement, changes in the mix of old 
and young people in the area). 
Health outcomes (such as changes in the occurrences of diseases or presence of some 


harmful substances, etc.) 
Psychological well-being (such as changes in the level of stress, self-esteem, self-efficacy, 


etc.) 

Several researches done by different social scientists have revealed that although social 
impact assessment research is a powerful type of applied research, most of the time its predictions 
made in different areas have failed to test the truth. 

Pe 


Review Questions 


Describe the characteristics of psychological research and analyze the various phases of 


research process. 
1 Give a suitable definition of scientific research and explain the nature of 
scientific enquiry. | 
Compare and contrast laboratory experiment and field experiment. 
answer with the help of suitable examples. . 
What is ‘research’? Discuss the special characteristics ot psych | 
ith nonexperimental research. Give 


Ilustrate your 


[ks 


ological research. 


Compare and contrast experimental research W 
relevant examples. 

What is “survey research? Discuss the relative adi 
different tools used in survey research. 

7, Citing relevant examples, discuss the majo 


‘hh 


vantages and disadvantages of the 


sin psyc 1c arch. 
r steps or Stages In psychological resear 


432 Tests, Mee 


8. 


10. 


What do you mean by ¢ 


ip a AC PECs 
asnrements and Research Methods in Be Havioural Sctence 
Citing some examples, discuss the importance of different ethical problems jn Social 
al wn . + 
lene pescarchs 4 Bienleati 7 
ie ase study? Discuss its a dvantages and limitations. 
What is meant by ethnographic study? Describe its advantages and disadvantapes, 
Distinguish between basic resea rch and applied research. 
Describe, by means of an example, the major threats to internal validity as emphasized 
by Campbell & Stanley (1960). 
Discuss the major reasons for carryin 


g out experiments in natural settings. 
aimed that basic research and applied reasearch faye P 


In what sense can it be cl 
r i i , ape 

reciprocal relationship to one another: 

When is a pilot experiment preferred mast? 


Discuss the major types of applied research, 


cs 


16 


SINGLE-SUBJECT EXPERIMENTAL RESEARCH 
AND SMALL N RESEARCH 


CHAPTER PREVIEW 


Meaning and Origin of Single-Subject Experimental Research 


e 
. General Procedures of Single-Subject Experimental Research 
Repeated Measurement 
Baselines or Operant Level 
Manipulating Variables 
Length of Phases 
» Basic Designs of Single-Subject Experimental Research 
Withdrawal Designs 
Reversal Designs 
Multiple—Base line Designs 
e Pata-Collection Strategies in Single-Subject Experimental Research 
e Evaluating Data in Single-Subject Experimental Research 
e Strengths and Weaknesses of Single-Subject Experimental Research 
e Comparison between Single-Subject Research and Large N Research 
e Small N Design: Nature and Historical Perspectives 





MEANING AND ORIGIN OF SINGLE-SUBJECT EXPERIMENTAL RESEARCH 

The research strategies just described in Chapter 15 were such in which a group or different 
groups of subjects generally participate in the experiment, In the present chapter, we are going to 
discuss research strategies in which a single subject (N= 1) participates. 

Single-subject research, sometimes also referred to as single case or Nof one research, is that 
in which usually one individual subject participates and the effect of interventions is vigorously 
studied over the same one individual over time. In other words, a single-subject research is a 
repeated-measure experiment conducted on one subject. While the focus of single-subject 
experimental research is the individual subject, no doubt, most of these studies include more 
than one subject, such as two (Best & Kahn 1998). In this situation it is referred to as Small N 
research (Robinson 1976). Like any other experimental research, in general, single-subject 
research is a method of testing hypotheses. In general, such research is used to test the hypothesis 
that a particular treatment will tend to have an effect on one or more behaviours. Single-subject 
research designs are very useful in clinical research, particularly in the area of behaviour 
modification and drug evaluation. Most, if not all, behaviour modification research, utilizes the 
single-subject research design. But it should not be thought that it is limited to behaviour 
modification research only. In fact, single-subject research design can be applied to a variety of 
research topics. If we pay attention to single-subject research designs, it will be obvious that these 
designs are similar to the three quasi-experimental designs, namely, the time-series design, the 

433 





rh auf Scrences 
its and Research Methods in Bebat ioural Science 
ehS 


434 Te. Measurem! 
sale -[e - st-tesl des ; ' 
age nie io gn. The Only 


the equivalent mate 
n and the eq ed with a group of subjects whi 


pans sae ie quasi-experimenta designs are Us 
tha h is concerned with individuals. 

of single-subject research design, we come to the conclusion that 

rding to Kazdin (1982), the developmen, of 


dy method. Acco 
is being currently practised, is largely the outgrowth oj the 


ly conducted animal laboratory research to elaborate wha 
he research methodology of Skinner, popularly called 


equ 
difference 'S 
single-subject researc 7 

if we trace the origin 
developed from the case study | 
single-subject research design, as it 
work of B F Skinner who had rigorous 
is called operant conditioning. In fact, the r 
experimental analys!s of behaviour, contains many 
single-subject research design of 
included one or a few subjects upon who 
note subtle changes in behaviour. In 
eings and they successfully 
f behaviour for autistic and mentally retarded children. As q 


m repeated observations of measurements were taken 


to 
methodology to human b 
experimental analysis © | 
consequence, a new field called appli is wa! wi 
named Journal of Applied Behaviour Analysis, first published in 1968. In this journal, most of the 


research articles reported the use of single-subject research design. 


GENERAL PROCEDURES OF SINGLE-SUBJECT EXPERIMENTAL RESEARCH 


There are several aspects of general procedures ot single-subject experimental research, which 


are worth mentioning. Some of these aspects are mentioned below, 
1, Repeated measurement 
2, Baselines or operant level 
3. Manipulating variables 
4. Length of phases 
A discussion of these procedures follows. 


1. Repeated Measurement 
One important aspect of single-subject research is the repeated measurement or observation of 
the same individual over time. The purpose of such repeated measurements or observations is to 
determine if changes introduced in the experimental condition called intervention or treatment 
affect changes in the subject. The more careful and systematic are the repeated observations, the 
more valid and reliable data are likely to be gathered. 

To assure reliable and valid data, the measurements to be used must be clearly defined. The 


researcher must be careful in selecting the behaviours to be observed. The behaviours to be 


observed or measured must be such as the subject, without any hesitation, be ready to exhibitit 
with reasonable degree of frequency. Sometimes, measurement procedure includes tests, 
surveys, questionnaires, opinionnaires, attitude scales, etc, The researcher must select such 


instruments as can be used repeatedly without any contamination or test-interaction effect. For 


enhancing the reliability and validity of the single-subject research, the measurements must also 
be done under completely standardized conditions, Standardized conditions in which 
measurements are to be repeated, include uniformity in the time of day, the circumstances and 
general surroundings such as location, presence of others, etc, A careful researcher must use the 
same measurement of observation procedures for each replication of the measurement. 


2. Baselines or Operant Level 

The baseline, also called operant level, is one of the important aspects of the general procedure 

of the single-subject experimental research. In this research, a baseline is used to determine the 
nig a f ‘ “ : : 

status of the subject’s behaviour prior to the intervention by the researchers. Baseline data is often 


features that are characteristic of the 
today. In many af his animal laboratory researches, he has 


the 1950s, several investigators adopted Skinner’, 
indicated the clinical utility of the Skinne,. 


ed behaviour analysis was born with its own journal, 


SBIO-Spbive) Peters 
eS Dect Experimental Resecirch ane Small VResearch 43 
| NAGSCHTED 435 
rathered by observing the different aspects of the individual’s behavic 
coveral times prior 10 the intervention by the researchers, The baseli iour, 
the trend of data. Generally, three types of trends nie at 
€vVed 


determining t* 
stable rate, an increasing rate and a decreasing rate. For better 


It is to be under study 
uld be lang enough in 
ed by the baseline—a 


re: inet evaluation of the effecti 3 
intervention, the baseline must demonstrate a stable rate. if the baseline is a aie i : 
3 8 an increasing 


vate OF trend, a serious problem in evaluating the effectiveness of the interventio 
the baseline, even prior fo intervention, is showing a trend in the desired di ar x peee 
rection. A g 


‘ne is formed only after a minimum of th 
haseline is formed © of three separate observations: however {j 
ayen more observations will be still better ations; however, five to eight or 


3, Manipulating Variables 


ike any other experimental research, a single-subiect r 
vaitables. In single-subject research, the ideal condition pees schon earch of 
manipulated at any given time. If two variables are manipulated at the same ae ei ae be 
each can’t be separated and the effect becomes uninterpretable. In such a siuaiion t e sie of 
should be manipulated one by one and not all at a time. For example if the research e variables 
now the effects of systematic desensitization and medication upon ana cave dill, 4 ‘ 
wants to proceed with a single-subject research design, he should rst tie an “ Qt 7 - 
independent variables, that is, systematic desensitization and medication, The caer m 
then, follow the baseline with one intervention or treatment, say, systematic desensitization pr 
a period of time with systernatic desensitization, the treatment or intervention should be removed 
and the baseline repeated. Following the second baseline, the researcher would introduce the 
second intervention, that is, medication for some time. After this, the intervene Wiauid he 
removed and the third and final baseline would be introduced. If the researcher manipulated the 
two variables in the same phase, the effect of each could not have been separated and the effect 
would remain an uninterpretable one, 


4, Length of Phases 
In single-subject experimental research, ordinarily, there are three phases—haseline, 
intervention, baseline. Now the question is: What should be the length of each of these phases? 
This has been a topic of heated discussion (Bijou et al. 1969; Barlow & Hersen 1984). Barlaw & 
Hersen (1984) have opined that as far as possible, the relative length of each of the different 
phases should be equal. But this should not be accepted as a general direction for all situations. In 
fact, there may be a situation in which the first intervention has to be longer than the initial 
baseline in order to demonstrate an obvious change in behaviour. If this happens, the subsequent 
phases, that 1s, the second baseline and the second intervention should be of the same length. 
However, there is one potential danger in having a longer intervention phase, that is, a carry-over 
effect is likely to occur, When the intervention phase is made longer, the effect of this intervention 
continues into the next phase, that is in withdrawal of the baseline. This effect ts called carry-over 
effect. Bijou et al, (1969) have recommended short intervention period tends to prevent 
carry-over efiects. 

In conducting an experiment based upon a single-subject research design, the above aspects 


of the general procedure must thoroughly be kept in view. 


BASIC DESIGNS OF SINGLE-SUBJECT EXPERIMENTAL RESEARCH 


In a single-subject experimental research, the following three types of basic designs are 


commonly used: 

1. Withdrawal Designs 

2. Reversal Designs 

3, Multiple-Baseline Designs 
A discussion of these des igns follows. 


is in Rebarioural Sownces 
ty Methonts 
appre’ HtES ama Researel Met™ 

TpATs, Meant 
aso Tee 

eas hy imental treatment, in 
in. which ‘ntervention oF eta ieeel ne ot 
ia ithdrawn. There are three bas! : : atts: ct 
eee baseline period, is withdre ra ee amon iien tone a 
ase | 


ane the eino-treatment design. : 3 BN, 
— ale 8 design and ek 'B desig which is the simplest research strategy jn ah 
oesign. or With TE 


it would be proper to start 
N=1 research. 

(ij) A-B desic 7 bo 
baseline (A) for a behaviour is est ang 
would continue to exist it no (red 


iction, t rc 
ror rediction, the researc! : | 
a am ie Te logic of A-B design is that the researcher predicts that behavioy 
si : : 
the effect of treatmem- 


long the same course as the baseline if a treatment or intervention is ng 
would tend to continue 8. ¢ intervention the behaviour changes of the response measuiay 
introduced. Hf at 1 F aseurcket concludes that his intervention is effective in producing the 
noticeably poco 7 . < weak because of the following two reasons: 
change. = on a does not know what the response rate might have been had no treatment 
(a) Iner , 


| inistered. | 

| been ai does not know for certainty whether any response change was produced 
(b) The espeved enecific intervention or treatment. It might have chan ged just because the 
aie did s ahiie different. This is called the placebo effect which is one of the 

important limitations of such designs. | | 5 ei 
i) A-B-A design: A-B-A design is a very importa nt and populas a ce used 
single-subject research. This design has a baseline (A), interve ntion ‘a a yaseline (A) 
sequence. Thus, this design has three phases, each of which represents a series O easureiet 
In this design the behaviour is studied to examine whether it changes from A (baseline, or contro} 
condition) to B, the treatment condition, and whether or not it comes back to the baseline (A) if 
the treatment or intervention is withdrawn. If the behaviour actually increases during treatment 
and then again decreases following withdrawal of intervention and thus comes to the level of the 
baseline (A), a sufficient reason is established for the fact that the response change is a function of 
manipulation of the independent variable or intervention period. The A-B-A design is more 
convincing and powerful and yields more reliable and valid data because of the test as to 


whether or not the withdrawal of intervention does produce a return of the response measure fo 
the baseline. 


ins 
ithdrawal Des 
1. Withd | design are those 


' = | research. In this decie. . 

secrapezat search design for N 1 researc esign 3 

n: A-B design |§ ‘ a ans subsequently, it iS predicted that the behavioy, 
ablishe " aacietered. If following the intervention (B), the 
her may attribute such behavioural changes to 


(iii) A-B-A-B design; When intervention is reintroduced after the withdrawal phase, this 
results in A-B-A-B design which in fact, is analogous to the equivalent time-samples design, a 
kind of quasi-experimental design. In this design operant level, or baselines (A), and interventions 
(B), each is repeated twice. According to this design, the behaviour may change from the first 
baseline (A) to intervention (B), whereupon it may decrease following withdrawal of B (second A) 
and, finally the behaviour may again increase with the introduction of the intervention (B). Asa 
consequence of this, the functional relationship between the intervention (B) and the behavioural 
measures is strengthened. Thus, A-B-A-B design provides a better opportunity for c arefill 
examination of intervention effect than the simple A-B-A design. Kazdin (1982) has rightly 
opined that A-B design tends to examine the effect of treatment or interve | e 


baseline condition \A phase) when no intervention is done 
phase). He further says that sub 


ntion by altering the 
with the intervention condition (B 
(A-B-A-B design). The impact Wie, re 2 phase ‘are repeated to complete the four phases 
intervention phase, and reverts ta thee me: pecomes clear if performance improves during the first 
is withdrawn and again, improves wh original baseline levels of performance when intervention 
A.B-A-B design the ani apa 2 Acsiaaal iio IS reinstated in the second intervention. Thus, in 

| serves whether behaviour changes immediately after giving @ 


Singie-Suhject Faeperimental Rewarch and Small M ewearch 447 


rt variable (first 6), whether behaviour reverses when treatment is withdrawn (second A) 
ther behaviour improves again when treatment is reintroduced (second 6). Since 
«< removed during the second A stage and improvement in behaviour, if any, is likely to 
[re versed at this point, this design is also sometimes called reversal design. 
a A-B-A-B design can be explained by referring to an experiment reported by Harris, Wolf & 
(1964). The experiment was conducted on a four-year-old boy who cried a great deal after 
wii cing even a minor frustration. It was determined that he cried about eight times during 
a morning. The response rate (or crying rate) was studied for the first 10 days to 
ea rin the baseline (A). The experimental treatment (reinforcernent) was introduced for the 
estab 5 days (B). In this experiment, the reinforcement was the special attention from the teacher 
— : cr ing brought. For the next 10 days, the teacher withdrew the reinforcement, that 1s, he 
A | ae crying episodes of the boy (A). Rather the teacher reinforced some other constructive 
nea by giving proper attention to the boy. It was found that during this withdrawal period the 
a ae of crying episodes sharply decreased and during the last 5 of these 10 days, only one 
ane : response was made. During the next 10 days again, the teacher reinstated the 
OT eatiiais by giving special attention to his crying episodes (B) and it was found that 
approximately original crying response rate was reinstituted. This experiment clearly shows that 
some given treatment affects the rate of responding. This experiment was replicated with another 
four-year-old boy, with more or less the same general results. 


One common variation of A-B-A-B design is A-B-A-C-A design which is used to assess the 
effects of two treatments (B and C) relative to the baseline, that is, A. With this design, the 
-esearcher can say whether B and C affect behaviour. However, the demerit is that the researcher 
can't assess their relative efficacies, that is, he can’t tell whether one treatment is more effective 
than the other. 


(iv) Alternating-Treatments design: This design is, in fact, a subclass of A-B-A-B design. In 
this design, A and B are the two different treatments that are alternated randomly with a 
single subject. 


Thus, in this design, the treatment A is withdrawn and replaced not by the baseline but by 
another treatment, that is, B. The basic purpose of this design is to evaluate the relative 
effictiveness of two or more than two treatments. Suppose the researcher wants to evaluate the 
relative effectiveness of two different methods of controlling a bad habit of thumb-sucking in a 
four-year-old child. He may proceed with an extended series of treatments to be alternated in 
A-B-A-B-A-B-A-B design in which A and B are the two different methods for controlling 
thumb-sucking behaviour. Over an extended period of time, the researcher may come to the 
conclusion that method A is better than method B, or vice versa. 


It is obvious that in the above alternating-treatment design, no baselines were taken. But an 
experiment has been conducted using alternating-treatment design in which baselines have been 
taken. The experiment has been conducted by Ollendick, Shapiro & Barrett (1981). In this 
experiment, the effect of different treatments for hair-twirling behaviour was evaluated. In this 
experiment, the effect of two kinds of treatments on hairtwirling behaviour relative to a no- 
intervention condition was evaluated. As a consequence, there were three conditions—(a) no 
intervention, (b) positive practice (first intervention), and (c) physical restraint of hair-twirling 
(second intervention). The baselines were established before any of these three treatments were 
administered, It was found that there occurred more decrease in hair-twirling in case of first 
intervention (positive practice) than in Case of second intervention (physical restraint). 


tpeallMe 
and whe 4g 
szatment is re 


In modern times, alternating treatment design is gaining more popularity because it has the 
advantage that treatment is used without a withdrawal and return to baseline. 


: ee a SCTCNCOS 
hb Methods Behari real 
pid Researed | 
yasterermens and B 


gap Tests, Me 
‘ect research. In this design, usually two alternatiy. 
su ie researcher establishes the baseline for @ 
anc | See , 
is subjected to one type of experimental treatment ang 
aviour is subjected to another type of EXPerimeny,, 
tions a 
been collected, the treatment conditions are reversed 5. that 
When sufficient data has be the second behaviour and the sec 


initially given to 
eee ey oe nt initially given | | , 
— a inital given to the first behaviour. bie Virtually there i 
ner a | i 90), one ex; 
t for the two behaviours. Following McGuigan (1990), one example Can 
a! of treatmen 


ee ing may be chosen f 
: iours like talking and crying may OF stud 
Abit atible behaviours ian ccgheaue tik 
be cited like ae _ "the baseline for both the behaviours waure Be ie *EParate| 
with this design. First, : hil d talks (first behaviour), reinforcement would be a MIN Stered 
Subsequently, each time the chi that period, crying (the second behaviour) would 


aS » child cries any time during AP Aare aes 
However, mit conten ann tcoiaaral t is expected that the child's talking behaviour wil 
not be reintorced (se ! 


-crving behaviour will decrease. Once this conclusion during this initial phase 
wrcrease ane cryINE ff ‘ted during the second phase in which the child's talking behavioy; 
nin saan ei “ ‘ing behaviour will be reinforced. It is likely that the reversal of 
a Se ee the nspanie rate by increasing crying behaviour and decreasing 
treatments woulc t ! 


talking behaviour. 


2, Reversal Designs 

Is also 
e behaviou 
behaviour. subsequent ; 
the other alternative an 


used in single- 
rs are chosen 
one behaviour 
ncompatible behav 


This design 
in compatibl 


treatment. 
the first behaviour rece 


behaviour receives the trea 


3. Multiple-Baseline Designs | oo | 
As its a implies, in this design several baselines are simultaneously established prior to the 
5 its 


administration of treatments. These designs are basically replication designs, There are three 
kinds of multiple-baseline designs as mentioned below. 
(i) Multiple-Baseline design across behaviours 
(ii) Multiple-Baseline design across subjects 
(iii) Multiple-Baseline design across conditions or environments 


These are discussed below. 

(i) Multiple-Baseline design across behaviours: In this design, the effect of independent 
variables across several different behaviours emitted by the same participant is evaluated, The 
researcher, here, takes several compatible behaviours, that is, the behaviours that occur 
simultaneously in the individual. He establishes baselines for each behaviour. Subsequently, a 
freatment is introduced for one target behaviour. If this behaviour changes following the 
treatment and other behaviours (control) remain stable at the baseline, the researcher concludes 
that the treatment is affecting behaviour. After some time, the treatment is a pplied to the second 
target behaviour of the compatible behaviours and its effects are recorded. In this way, the 
remaining target behaviours are subjected to treatment one after the other after some time. If the 
treatment is effective in changing the response rate following its administration to each of the 
target behaviour of the compatible behaviours, the researcher will have sufficient contidence in 
concluding about the effectiveness of the treatment. 

(il) Multiple-Baseline design across subjects: In this design, a treatment is applied in 
en rea ae es in different Participants who are in the same 
across different behaviour. ernie i h el Bis the same treatment is Bi 
| 7 mt 0 d bY a single Participant. When 
behaviour of different persons who are in the same 
For example, treatment is applied to one Particip 


ine ° reds alter two hours of establishing the baseline, and so on. Because of this 
malls ‘aie fa establishing the baseline Periods, this design has also been 
BBEe control design. One example of this design is an experiment reported by Singh 


iim 


plied in sequence 
treatment is applied to the same 
environment, sometimes a gap is followed. 
ant after, say, one hour of establishing the 


Stngle-Subjeet Laperimental Reseurch and Small N Research 439 


(1987). This experiment has been conducted on eight adult females who were 
ape institution for mentally retarded persons. These mentally retarded adults were 
nhac? their independent play by verbal and physical prompts made by the 
taught to SP First baselines were established for each female during which they were 
experimenter. ic ‘with toys. In fact, two kinds of behaviour, ice. independent play and 
instructed te =e) were recorded, It was found that for each subject, the baselines for both 
inappropriate f ad inappropriate play were low, Subsequently, a treatment was given to each 
independent : ic ants. During treatment or intervention, both verbal and physical prompts 
of the eit ie es were verbally instructed to play with the toys kept in front of them and if 
were giveny rovided physical guidance by placing their hands on the toys and comfortably 
needed, were P their play action. It was found that after such verbal and physical prompts for 
guiding se lie ammount of independent play increased in all participants and maintained in the 
some: wile ai The amount of inappropriate play stabilized and remained at low level. 
eed ail tecBaseliine design across conditions or environments: In this design, the 
a nee to the same behaviour when the participants are in different environments or 
treatment Is e iaststs the researcher may have four different patients in four different rooms. 
gener mine eriods may be established for each of these four patients. For example, 20 
Different base eit 30 minutes for the second patient, 40 minutes for the third patient and 50 
ibaa "he Gunh patient. After conclusion of each baseline period, the treatment or 
minutes fof ch as reinforcement may be introduced after making certain similar responses by 
iii aa in their respective rooms. If there is response increment in all patients following the 
‘ae mene the treatment is likely to be effective. 
TS 3 “ advantage of the multiple-baseline design is that the researcher has no need to 
saiabcathe treatment once it has been applied. This is in constrast to the withdrawal design. 


DATA-COLLECTION STRATEGIES IN SINGLE-SUBJECT EXPERIMENTAL RESEARCH 


ike any other experimental research, in a single-subject experimental research pk araral 
ie re commonly studied—overt behaviour and covert behaviour. The major a 
ee ace is the observation of an overt behaviour. In single-subject research, there 
ne : es of ways to measure overt behaviour. Some of them are given nai eee 
1, Frequency measure: |n single-subject research, irequency measure wee a aan hth 
of data collection. In this measurement, a simple counting of the number . ~—s en sa 
behaviour that is observed during a given period of time, is made. For igith 2, vr ee ee, 
on a mirror-drawing task, the researcher may simply count the Looe = ee erren): 
by the subject have crossed the boundaries of the star pattern (suc — > sonnet wa 
Likewise, if a teacher wants to know how many limes a student talks in : sl ire a. ae ap 
he may simply count the number of occurrences of such behaviour of the c ken mss 
simple method of data collection. Sometimes more than one behaviour is coun 


a Millich 
residing in 


of data collection. _ | . 
2. Duration: \t is a time-based method of data collection in which the actual amount of time 


during which the individual completes a behaviour, is determined. For mee eet 
particular researcher aims at improving the finger-dexterity skill of a child. 30) i se ci 
instructional programme to teach this child to improve the skill and the researc 
have the measure of the duration of the finger-dexterity task. | _— 
3. Method of interval recording: This is also a time-based measure of oo oa 
also known as time-sampling method. \n this method of data seniemeieh nance vance ' 
period is divided into, say, two intervals—observational interval and pi co nae paw 9 
the researcher is interested in eliminating a particular undesirable be ond oot sae! 
introducing some intervention, he may observe the child's behaviour for every S, 


einai ences 
ts and Research Methods I" Behavioural Science 
ements a 

440 Tests. Measit 


followed by a 2-minute nonobservation period for recordin 


bservation period pular method of data collection, it has ma : 
ny 


observed behaviours. Although this method is a po 
serious flaws (Barlow & Hersen 1984). 
4. Real-time observation metho 


9-minute © 


d: This is an excellent method of data collection ; 


‘ngle-subject experimental research. In this method the target behaviours of the study are 
oe hed in their actual frequency, duration and order. Since such detailed recording n 

fi : : . * 6 ® Peds 
expensive apparatuses and equipment, this method of data collection is not in much use. 


in single-subject experimental research, sometimes the behaviours that are studied are ng 


overt. Therefore, the above measures are not employed in data collection. In such a situation 
ot 


other measures of data collection are used. Three such measures are— 


1; Psychophysiological measure: In this method of data collection, measures of skip 
temperature, skin resistance, pulse rate, blood pressure, elc., are taken with the help of 
sophisticated instruments, 

2, Self-reports measure: For knowing the covert behaviour and experiences of the subject, 
the researchers sometimes adopt the technique of self-report measure in which the subjects are 
asked to report about their feelings and experiences in the concerned experimental situation, 


3, Response-specitic measure: In this method, data about some specific interval indices of 
the target behaviour are collected. For example, if a researcher is studying weight reduction, his 
data might include the count of calories consumed and the distance covered each day by the 
subject. 


These are the popular methods of data collection in single-subject experimental research. Of 
these various methods, the methods used in case of overt behaviours are common and 
more popular. 


EVALUATING DATA IN SINGLE-SUBJECT EXPERIMENTAL RESEARCH 


The data in single-subject experimental research are commonly evaluated through visual 
inspection. Statistical analysis is rarely used for analyzing the effects of experimental 
intervention. Through visual inspection, the experimenter or researc her examines such factors as 
changes in the magnitude and rate of behaviours being studied (Kazdin 1982). According to 
Kazdin, for assessing the magnitude of the change, the average rate of performance and the level 
at the change point should be examined. The average rate of performance is simply the number 
of occurrences divided by the number of sessions. A change in the level refers to change or shift 
in performance from the end of one phase to the start of the next phase. | 


For assessing changes in the rate of behaviour under study, Kazdin emphasizes trend and 
latency of the response. The trend of the data refers to showing of systematic changes, that Is, 
increase or decrease over time. The latency of change refers to how quickly the change in 


response occurs after beginning the treatment or wi of ! 
| withdrawal of treatment, which ens 
effectiveness of the treatment. ; iii 


ee A os te obtained by single-subject experimental research, the 
prela cals +f ‘hic ee if the average performance changes between phases, if a change 
aortas tine tole en ma between phases, if the trend of the data as measured by slope 
his haa ites sation for different phases and how quickly a change occurs after 
ooniesliecberdin a a of treatment is introduced, Some researchers, having poor 
of such data by hi Hig I aie ol data of single-subject research, tend to boast the evaluation 

y high-level parametric and nonparametric statistical analysis. Such practices must 


not be reported to, as statistic 
tistical analyses ar : 
Renan By yses are rarely used in the evaluation of single-subject 


Single-Subject Experimental Research and Small N Research 441 


STRENGTHS AND WEAKNESSES OF SINGLE-SUBJECT EXPERIMENTAL RESEARCH 


There are come advantages or strengths of single-subject experimental research. As such, this 


research Is commonly conducted in psychology. Some of the main advantages are as follows: 


(i) The biggest advantage ot single-subject experimental research is its ability to carry out a 
ecientific investigation with only one subject (or sometimes two). This advantage is more 
‘mportant for a psychologist than for other social scientists, because it saves much time in dealing 


with many subjects or a large N research or between group researches and he is able to have full 


concentration on only one subject. 


(ii) Single-subject research allows the researcher to control the experimental situation more 
effectively by establishing @ good, obvious and continuous measure of the dependent variable 
(DV) throughout the experimental situation. 


(iii) Such researches rarely require statistical tests to be performed for evaluation of data. 
Only through visual inspection, the experimenter examines such factors as changes in the 
magnitude and rate of behaviours being studied (Kazdin 1982). This point is a big advantage for 
those who dislike carrying out statistical computations. 

liy) Both exploratory and descriptive researches can easily be carried out with 
single-subject design. 

(v! Single-subject research allows the researcher to eliminate and hold constant extraneous 
variables that don’t show up until after the investigation is under way. In fact, intra-subject 
comparison in case of single-subject research provides better control of extraneous variables 
than the inter-subject comparison, which is the case with large N experimental researches. In 
other words, such research has sufficient degree of internal validity. 

Despite these advantages, single-subject research has some disadvantages or limitations. 
Some of these disadvantages are as follows: 

(i) Inappropriateness. Single-subject research is not appropriate for certain types of 
psycho logical research such as survey or ex post facto situation. 

(ji) Practical limitations:, Single-subject research is time-consuming research. It generally 
takes several fortnights or months to complete whereas many large N research designs can be 
carried out in only one session. In fact, single-subject research is feasible only when the subject 
remains willing to co-operate by giving sufficient time for fortnights or months. Sometimes it has 
been reported that the treatment or ‘atervention in such research is difficult to be measured 
repeatedly or the nature of DV is such that it can’t be measured repeatedly. For example, oncé a 
problem is solved, there is no further DV measure to be taken. 

(iii) Order effects: |n single-subject -esearch, there are obvious order effects which resultin 
confounding and limiting the quality of generalization. Since there Is only one subject over 
whom treatments are applied, it may be that practice of one treatment May be carried over to the 
later treatment and may improve performance under later treatment or the first treatment may 

introduce fatigue to the second treatment. Still another type of order effect may be in limiting the 
generalization. Suppose, for example, the investigator 1s evaluating the effectiveness of a 
particular therapy. Such therapy could be ineffective in the first B phase of A-B-A-B design but 
could be effective in second B phase. From this, it may be erroneously concluded that for 
successful application of therapy, such therapy should be presented, then withdrawn and then, 
again presented. 


(iv) Baseline problems: Barlow & Hersen (1984) have discussed many problems related to 


baselines in single-subject research. One of the basic assumptions of single sublet (er 
that a stable baseline has been established. If the baseline varies, it is difficult for the researcher Co 


conclude a reliable change in behaviour following treatment OF intervention. hs pred this, 
sometimes in A-B-A-B design which has two baselines (A), the second baseline becomes 


ry 
on and Research Methods in Behavioural Scterices ™ 
sss. Mocasurennentts 
vnarkedly different from the first. In such 2 situation, the researcher would not be able te aga 
vacts of intervention with certainly. i 
— . vont je effects: Another disadvantage of the single-subject research is that 
treatments ; given to the subjects are not reversible. For example, in drug evaluation researeh a 
immact of some of the drugs may linger up to the return-to-the baseline phase because -¢, 
profound effect. In such a situation the effect of intervention is difficult to be isolated, ty 
this, sometimes the effect is reversible; however, such reversal is not done because that 5 
unethical and/or practically difficult. Suppose the researcher is giving therapy to a phobic, 
following A-B-A design. The therapy has been given in B phase and he is able to elimin 
irrational feat. But in second A phase the design requires to return to the baseline and in this cae. 
«¢ fear is again introduced in patient. Obviously, this can be done by the therapj. 7 
phobic 3g3 therapist o 
researcher. But will such reversal be ethical? Certainly not. es 
(vi) Lack of effectiveness of treatment: In single-subject research, a doubt regarding a, 





























effectiveness of treatment or intervention introduced often occurs in mind. Are changes 
behaviour of the subject due to the introduction of IV (or a treatrnent)? Or, is it merely be 
something different from the baseline is being done? Nothing definite can be postulan 
solution often suggested to this limitation is to use A-B-C-B design and consequently if be | 
during the C phase very much resembles the behaviour during the A phase, the probability a 
the change during the B phase is specifically produced by the intervention introduced, = 
PP perme Some ae Perot dare tole often seen in introducme 
imervention or treatment. Genera » the researcner waits tor some cues indicati vangei 
subjects’ behavigur and then he introduces a withdrawal of the treatment. such oe wy the 
researcher are not scientific. To remove this limitation, the researcher should recise Bi 
length of the baseline and treatment Phase before data collection, | Pe 


has iN Comparison to ti oe 
27 &xperimen Poor degree of extern, tothe between groups 
& Conducted oni __ Lmemal validity or generalizability. Since sick 
Pa Relies are difficult tobe 
; 5 replicated syst=nnatically ¥ ith 
Kale the study, the eneralizalion 


Single Subtect Seterimental Kesdreh and yeaall N Mewarch 44 


Thus, “e find that single-subyect fevearch has ware lurretatexn, despre ware 1A ote 

antage>- Sull single-subject research has heen a popular retin] cA ceeareh an tre 
£ xperimnental Analysis of Behaviour as well as in other areas oA resent a ny an Crag, evaluate 
gnd behaviour therapy. 


COMPARISON BETWEEN SINGLE-SUBJECT RESEARCH AND LARGE N RESEARCH 


single-subject research and large N research are both expernnental tewearches whete | 44 aft 


manipu lated and its effect upon DY is examined. However, thete are mayer ports i reTe hs re 


wit? difter. 

1. Control techniques: Large N research generally employs randormzcoon, miakeng 21 
extraneous variable an independent variable, and varigus statistical comrols whetess 
single-subject research does not. Single-subject reseatch docs use rardeamizaium at a sears cf 
subject-selection but not as a technique of controlling extraneous vanables in the experirnert. 
although both large and single Nor small N experiments exnpley elirrinauen and comaancy 2 
contro! techniques, the ways they use therm differ. In larse NN experiments, extratecvus variables 
are eliminated or held constant at the beginning of the investigation. in sangle Mw sxpeturetits 
elimination and constancy are employed as the investigation ss carned cut, os 250 of The cateet. 
In such experiments, the experimenter not only tries to hold the extraneaus variables conmstard tas 
also increases his control over the experimental situation by sefting Crmmirusus Treasere> 2 I 
dependent variables. We also find that large N experiments control chance fluctuatsans Dy ung 
a large number of subjects whereas single N experiments costro! thern by useng an mecreased 
number of responses of the subject 

2. Manipulation of dependent variables. \n large NW experiments tne eauperenentes gereraily 
sets up his experimental design, controls the eatranequs variables as far 2+ possible, apples an 
independent variable and takes a one-shot measure of the denendem variable (or Dy). Here tate 
is done about DV except this one-shot measure. But with single N research, the experirmereet Carl 
actually change the DV by varying the extraneous variables. tn faci, this procedure 5 carne coun 
in an effon to develop a stable DV measure prior to applying the independent variate. 

3. Monitoring the experimental data. In large “ experiments. he expermmenter rakes letthe 
effort to monitor and analyze the data until the experiment ss over. But in singie “wexperwmers. Sue 
experimenter continually monitors his data so that a watch can be kept on Changes i &raneras 
variables that may occur dunng the experiment self. Such mondornng ciso becomes meessary 
because shifting from one condition to the other is usually determined try the behavecur of 
the subject. 

4. Data analysis: In larpe Nf expermments the exqpernnetter Vancush uses Satreiscal teces [ree 
test, F test, Pearson r ior determining the signatrcance of the data. Frequeraly in comeke= “Wo omall 
N experiment, no statistical test sm used ix evaluating the data Very Gempse mreeatarre< Ire 
averages are Computed or the data are simply presented in 2 graptuc torm tex weetal! comparison 
of the subject's behaviour from one session to another 

>. External validity of the results Euterna!l valedsty retets tO the gerecality of Te retuls. in 
large “ex periments, the experimenter tends 10 increase the gerverality <f The results bry rcreasing 
the number of subjects in the inwestigal im. Mary researc hers believe that congy bw MITES 
the Murine of subjects, the general ty of results can be wcressed bec ase the orate mean arvi 

Vatance of the sample will closely approach the true mean and variance of the population. Bus 
this. in fact. is not the Correct asumouon. Suppose an experimenter tind that a pertscufar 
method of teaching is effective on a titth grade student. Can the tevsibel itw of maeeng the methiced 
etective be generalized to a mentally retarded child simoly Ov adcong aocemal noemal suldren 
of thine fifth grade? The answer wil! be deinitet “~o. in fact? The ero etcre oF garer2! =, foe = 


peMOes 





arioureal SX 
gnu Research Methods 1" Beli 
estar rnentes f et | 7 - 
_arried out through inductive reasoning, Wiebe. — 
ere s between the two situations, the EXPEFimeny 
, xperiment tends to determing i 
1 


Single-subyect © a 
her than by inferential statistical methods 25 


Teas Vit 


Se 


sher-can be 
wn {a anot e 
pt ties and difference 


he similar 4 
ceneralizable the data 1s. 
w gene 3 ieee Te 
ty of results by this inductive reasoning 
A 5 ; 
se N experiments. 
ase of large z - 
bject research differs considerabl 
hnique, data an 


loving control tec 
N DESIGN: NATURE AND HISTORICAL PERSPECTIVES . 

ib med as a new science in the second half of the 19th centy 
fie ts infancy. Galton was planning to conceptualize Correlation 
| «éarantial statistics were used only after Fisher’s work on Analysis of 
1930s. Before this, small N studies prevailed in Psycholo 
|| N designs are those in which more than ote 


type of situ 
comparing 
determines ho 
general 
find inc 

Thus, single-su 


-from large N research in various j | 
Le erality ORME, BE MPOttany 
alysis, genera ity of data, etc. 


aspects such as emp 


SMALL 
At the time psychol 
statistical analysis W 3 
techniques and higher inferen 


Variance (ANOVA) appeared in the 
4s its name implies, researches based on sma 


subject, that is, a small number of subjects participate. | 
: e Hermann Ebbinghaus, Charles Darwin and Watson ane 


- psvchology’s pioneers lik 
Pee didi a Nosssihle, that is, they studi ed their own behaviour or the beh aviour of 
a single individual. Ebbinghaus conducted exhaustive study of his own ability to learn and recal| 
lists of nonsense svilables. Charles Darwin conducted study on child development by keeping a 
detailed activities of his own son’s childhood. Watson and Raynor conducted study on little 
Albert. 
At Leipzig University in Wundt's laboratory, experiments based on small N designs were 
very popular. These studies normally involved a very small number of research participants who 
were mostly doctorate students. For example, the bulk of ) M Cattell’s dissertation research on 
reaction time included data from just two persons—Cattell and his fellow student Gustay Berger, 
In 1893, Dressler also conducted his famous study of facial vision using only three participants, 
Although small N studies and many similar other studies were very popular, large N studies were 
not completely absent. Such large N studies could be found in educational psychology and in 
child study research (Danziger 1985) 


Reasons for Small N Design 
There are several reasons why small N studies are used. Some of the important ones are as under: 
i) - 2 » . 7 - ; : 

i : ee acvoates of research featuring one or Just 4 few subjects argue that grouping 

pret 0 a ea ae performance and produce misleading results. The process of 

siblin = rem large Broups of individuals sometimes produce results that. in fact, can 

a a. pore who took part in the study. This viewpoint has Baca sige i 

i: mie onsiarly a Ok entitled lactics of Scientific Research (1960). SINCe group avera ms 
oY individual differences between subjects, Sidman (1960: 274) hae rightly 


opined, “group data may of 
lay often describe a process or a { ; 
any individual * Process or a lunctional relation that has no validity for 


(ii) There are some Practical problems ww a 

Tne rope mates become very cumbersome because roc gre ree, N design. Such 

| Pens especially in the field of abnormal a potential subjects are difficult to find. 
wants to study a particular disord ss | mat and clinical Psychology where the rese: h 
‘ ais lor those who are suffering ute h i ne wants to test the eflectivencss cece 

‘Ould compa Ypochondrias;). = ee rs | 
Necessary, obabiy en participants with st emt a a . aan Ws N'cesign 
obtaining two matched ero, eristics like Bender, intelligence sOcio-< “a = nig ‘woul 
“U Broups, a relatively large initial sample ( ete isenon prahie 
more than 100) of people 


Single Subject Experimental Research and Smeali MW Rewearch 44S 
mr hypochondriasis 1S needed, Now the researe er may nest he getting SLt n large 
ro a 


will have to wait for a longer period of time. 


ed problem also occurs in some animal research, especially if surgery 15 
al colony itself 


euflerinb ' . 
cample or ne 
a closely relat J esrumee'is tine act meade eredl seamen 
ad. The procedure of surgery is time consuming and expensive anc the anima 7 
required. ‘alt to maintain. In some animals being studied, it might be hard to obtain thern or 
can Of are | ng training. For example, a research on learning sign language to chimpanzees 
ay require z may require hundred of hours per animal. If large N design is used, many such 
and other ae be required and this may prove to be a Hereculean task for the researcher. The 
animals "eked by Patterson and Linden (1981) has shown that during teaching sign language to 
study CO” vorilla, the ape could learn more than 400 different signs at the age of 10. The study 
a pie Svan the ape was just a year old, 
ha Astill another related reason of why small N research should be preterred has been prov ided 
kinner who believed that the best way to understand, predict and control behaviour is to 
tidy single individual intensively. The researcher can derive general principles only after 
study or e study of individual cases. In simple words, psychology should be an inductive 
axa ee from specific cases to general laws of behaviour. As Skinner (1966, 21) has 


sche Te ecenarctee should “study one rat for a thousand hours” rather than, “a thousand rals 
Babe hour each or a hundred rats for ten hours each”. 

Thus large N researches may occasionally fail to reflect the behaviour of the indi iduals and 
may not be feasible even if they are desired. This naturally paves the way for small N 


by 5 


they 


researches. 
i — ee 


Review Questions 


What is meant by single-subject experimental research? Discuss the general procedures 


1. 
of this research. 

2. Discuss the different important designs of single-subject expenmental research. 

3. Outline the important data-collection strategies in single-subject experimental! research. 
How will you evaluate the data obtained in such researches?’ 

4, Point out the major strengths and weaknesses of single-subject experimental research. 

5. Make a comparison between single-subject experimental research and large N 
experimental research. 

6. Under what conditions is single-subject experimental research considered more 
appropriate than a multiple-group expenmental research? 

7. What types of research questions can't be easily answered by conducting single-subject 
experimental research? 

8. How will you defend the criticism that a single-subject expenmental research lacks 


external validity? 
9. Why do we call A-B-A-B design as a reversal design? 
1U, Distinguish between baseline and intervention stage of single-subject experimental 


—_ 


research, 
Discuss the probable reasons for small N design. 


+ 


| 


— 
a 


> 


17 
HISTORICAL RESEARCH 
ce ne 


CHAPTER PREVIEW 


» Meaning of Historical Rese arch and Its Necessil) 


Steps in Historical Research 
e Sources of Historical Data 
Primary Sources 
Secondary Sources 
e Historical Criticism 
External Criticism 
Internal Criticism 
e Limitations of Historical Research 


MEANING OF HISTORICAL RESEARCH AND ITS NECESSITY 


Historical research is very different from the research conducted by Mest of the behavioural 
scientists. Kerlinger (1986, 620) has defined historical research as “the critical investigation of 
events, developments and experiences of the past, the careful weighing of evidence of the 
validity of the sources of information on the past, and the interpretation of the weighted 
evidences”. Thus, historical research refers to the application of scientific method to the 
description and analysis of past events. In fact, history—the meaningtul record of past human 
achievements, deeds and misdeeds—helps us understand the present and, to some extent, 
predict the future course of events. It should not be thought that history is merely a list of 
chronological events. Rather, it is supposed to be a factual integrated account o| the relationships 
between persons, times and events, Historical analysis can be directed towards an individual, an 
idea, a movement or an institution, But none of these objects of historical observation can be 
treated in isolation because of the intense interrelationship among institutions, movements and 
man. 

Historical research becomes necessary not only for knowing the past but also tor 
understanding the present and predicting the future in that context. Substantiating the role of 
past, one historian Arthur Schlesingen rightly commented, “No individual, let alone a social 
scientist, can ignore the long arm of past.” The noted thinker, George Bernard Shaw, also 
commented in much the same vein, “The past is not behind the group; it is within the group.” 

There are some social scientists who try to establish link between the past experiences on the 
one hand and the present attitudes and values on the other. Among these social scientists, the 
name of W | Thomas, well known for his concept of “the processing of becoming”, and AT 
Adorno et al, (1950), who are the authors of Authoritarian Personality, are important. Adorno et 
al. vigorously demonstrated that the roots of prejudice were in early family experiences. He 
examined various techniques of combating such prejudices at the source. The Committee on 
Historiography of the Social Science Research Council also conc 
dynamics af c hange and de velopment was now re : 
the relationship of man, institution and moveme 


luded that the analysis of 
garded as a fruitful step towards understanding 
nt. Anthropologists also utilized the historical 


Hb 


~~ 


Historical Rewarch 447 


jor studying @ culture group. In fact, a historical investigator like other investigators, collects 
: me iyates data for validity and interprets the data using historical method or historiography, 
» relevance of historical data for social research is great in the following three types 


cial 

data, © 
Th 

of situations. | — | a | | 
(a) When they are presented as a complex phenomenon of social torces or social dynamics 
(b) When social phenomena reflect complex social processes in a meaningful way 


When various dimensions of interrelationships such as psychological, economic, 
educational, religious and political make significant contribution towards a unified 
complex pattern 

Historical research may be qualitative or quantitative or both depending upon the nature of 
blem and the type of data processed (Best & Khan 2006) 


(c) 


pro 


STEPS IN HISTORICAL RESEARCH 
Like a behavioural researcher, a historical researcher also follows some steps in sequence. The 
following are the major steps: 

(i) The first step in historical research is the selection of a problem. All problems are not 
suited for historical research. Only those problems are to be picked up for the study 
which need scanning of historical records and have some social utility. No current 
problem can be meaningfully studied with the help of historical research. 

(ii) The second step will be formulation of some hypotheses, Hypotheses are explicitly 
formulated in historical investigations of education, The historian gathers evidences and 
then carefully evaluates the trustworthiness of the hypothesis. If the evidence is 
compatible with the hypothesis or its consequences, that hypothesis is confirmed. It is 
through the synthesis of hypothesis that historical generalizations are 
ultimately established. 

(iii) The third step is collection of data. Historical data are collected with the help of primary 

source or secondary source, Primary sources are eyewitness accounts reported by actual 
observers or participants in the event. Secondary sources are the accounts of an event 
that are gathered after the reporter has talked with the actual observer or has seen the 
accounts of an observer. 
At this step, the data is analyzed and a generalization is arrived at. Precautions are taken 
to see that all the data have been collected and that nothing has been left out. In 
reaching conclusions, historians employ the principles of probability as is done by a 
physical scientist. They rigorously subject the evidences to critical analysis in order to 
establish perfect authenticity and correctness, 


— 


LIV 


SOURCES OF HISTORICAL DATA 
Like any other research, collection of data is an important aspect of historical research. There are 
two major sources of data collection in historical research. 


(1) Primary Sources oe 
Primary sources are eyewitness accounts of events reported by an actual observer or participant 
in an event, Such important sources, which are commonly used, are as mentioned below. 

(a) Documents 
Documents are those records which are kept and written by actual participants or direct 
witnesses of an event. These sources are also produced tor the Purpose oT WRHSe INNS 
information to be used in the future, Documents which are included as primary sources are 








ents and Hesed rch Methods in Bebat dor ral SCHENCES 
ane nes : 


448 Tests, Measure 
< constitutions, certificates, declarations, licenses, deeds a 

/ s, catalogues, films, adver; Ms, 
pictures, newspapers, Calalogues, Vertisemeny 


official minutes OF record 
research, reports, and so on. 


affidavits, books, diagrams, paintings, 


maps, handbills, recordings, findings of 
ore useful: 


Direct use of documents proves M | 
(i) When events, which these documents depict, have not yet been analyzeq ' 
the historians 
(ji) When such events have not 
historical settings 
(iii) When the purpose is to verify certain events directly 
liv) When certain aspects of life in which the investigator is interested have not ie 


embodied in later writings of analytical historians 
(y) When there is a missing link in knowledge that needs to be connected so that the whole 


event may be meaningfully understood in a complex whole 


(vi) When complete chronology of events is lacking 
(vii) When some controversial points need to be settled 
One precaution in the use of documents is that for gaining perspective, they should not be 
preferably used until they are at least a few generations old (Young!992). But contempora 
documents may be the only source of information available at the time and the closer a person 
remains to the time, spirit and scene of action, the more able he will be in understanding 


yet been incorporated into the writings of the Cultural 


the events. 

(b) Remains or relics 

Remains or relics are objects which are associated with some persons, groups or periods. Fossils 
skeletons, tools, utensils, weapons, clothing, buildings, coins, furniture, pictures and art objects 
are examples of such remains or relics, which, though not directly intended for use in 
transmitting information, provide a clear evidence of the past life and happenings. 


(c) Oral testimony 
Oral testimony is the spoken account of a witness or a participant in an event. Such evidences 
can be obtained in a personal interview or can be recorded or transcribed as the witness or 


participant relates his or her experiences. 


(2) Secondary Sources 
Secondary sources are accounts of an event that were not actually witnessed by the participant or 


the reporter, The writer of the secondary sources is not on the actual scene of the event, rather he 
merely reports what the person, who was there, said or wrote. Many books on history and 
encyclopedia are examples of secondary sources because thay are often far from the original 
contents. Secondary sources of data are used less than the primary sources because of the errors 
that may result when the information is passed from one person to another. 

Some types of material may be used as primary sources for some purposes and seconda 
sources lor some other purposes. For example, a high-school textbook on Indian histo i 
ordinarily a secondary source. But if the investigator is studying the gradually changing e has 
on nationalism in high-school textbooks on Indian history, the book would be ti leet: 
document or data. | ane ae 


HISTORICAL CRITICISM 

Historians are often in diffic 

ordinarily don't use the meth eens trustworthy and authenticated data because they 

i ete se method of observation. Since past events can't be repeated at the will of 
B of historians, they are forced to depend upon those who witnessed or 


Historical Research 449 


aicipated in these events. They are required to make a careful analysis so as to distinguish 
ee between a true and a false event or between relevant and irrelevant information. The 
are trustworthy, authentic and worth using are called historical evidences. A 


nerringly 
| evidence |s derived from historical data by the process of criticism which is of two 


istorica : 
hi ernal and internal, 


types 
(1) external Criticism | | 
external criticism Is one wh ich establishes the authenticity or genuineness he Historical las, 
Historians con a oa a SeruInENesS of data. For this purpose, they thoroughly 
aaa what ain oh i language documentation, knowledge available at that 
Se: what is known. Sometimes historians resort to physical and chemical tests of ink, 
aint, pape’ cloth, metals or wood. They try to establish whether or not these elements are 
consistent with known facts about the person, knowledge available and the technology of the 


period from which the remains have been obtained. 


(2) Internal Criticism 
criticism is one in which the historian tries to evaluate accuracy or worth of the data. 


nal 

een criticism reveals only the genuineness of the data but whether or not these data reveal a 

true picture is indicated only by internal criticism. Here historians try to establish whether the 

writers or discoverers of the data were honest, unbiased and actually acquainted with facts or 

whether they were themselves biased or too antagonistic or too sympathetic to a true picture, 
whether they were in agreement with other competent authority of that period, whether they 
wrote about the events freely or were subject to pressure, tear or vanity. Although these questions 
are difficult to be answered, historians try to establish that the data are not only authentic but also 
accurate ones, failing which they can't introduce historical evidence worth consideration. 


® 


LIMITATIONS OF HISTORICAL RESEARCH 
Historical method of social resea rch has some limitations which are even accepted by historians. 


Some of these limitations are given below, 

(i) One common limitation is whether or not the data being made available are 
dependable. The problem becomes serious when the data relate to a distant past. Often modern 
historians agree that a few narrators of historical events report them without injecting therein 
some such impressions of their own—a fact that introduces an impressionistic and 
propagandistic element. 

(ii) No life-size writing of historical events are possible. Each historian records what he 
considers to be relevant and ignores what he considers irrelevant. Accordingly, no life-size 
history can be written. This introduces an element of bias in historical research. 

(iii) Not all the happenings in time and space can be known at the time of writing. Historians 
try their best to collect maximum information from all sources. Despite their honest intention, 
some happenings, especially those in distant villages or places, may not come to the notice of 
historians. As such their record may remain incomplete and only partially dependable. 
al biases and private interpretations often enter unconsciously into the 
he one hand, often find it imperative to omit a 


(iv) Some person 
retain, they often interject their 


interpretation of historical events. Historians on t 
mass of detail, and on the other hand, in the remaining mass they 
own conclusions which are often deviated and biased ones. 

(v) In historical research, sometimes difficulties arise in recognizing the problem. Often itis 
seen that historians state too broad a problem which is not worth investigating. Therefore, 
historians must select a penetraling analysis of a limited problem rather than a superticial sketch 


at only some broad area. 
h tend to make the 


Thus, we see that historical researches have some limitations whic 
generalizations vague and the conclusions inappropriate ones. 





Research Methods tt Behavioural Sciences 


450 = Fests, Mevastarernents and 
Ml Review Q | 


s of data in a historical research. Also, point ou the 


search. 
earch’? Discuss the steps involved in a historical pes 


al criticisms in historical researches 
te THE PROBLEM AND THE HYPOTHESIS 


1. Discuss the major source 


limitations of a historical re 
2 What is meant by ‘historical rese 


3. Point oul the relevance ol historic 


CHAPTER PREVIEW 


ing and Characteristics of a Research Problem 


- Mean 
~ of Stating a Resear h Problem 


oe “our 
e Important Considerations in Selecting a Research Problem 
» Ways in Which a Research Problem is Manifested 
e Types of Resear h Problerns 
e Importance of Formulating a Research Problem 
e Steps in Fe mmulating a Research Problem 
e Meaning and Characteristics of a ¢ rod Hypothesis 
e Formulating a Hypothesis 
e Ways of Stang ao Hypothesis 
« Types of Hypotheses 
e Sources of blyp theses 


e Functions of Hypotheses 


MEANING AND CHARACTERISTICS OF A RESEARCH PROBLEM 
Any scientific investigation starts when a person has collected many facts but all that can be said 
on the basis of those facts is that there is something which we da not knaw, Here, a problem 
originates. A problem is manitested in at least three ways. First, when there is a clear pap in the 
results of several investigations in the same field, a problem is said to exist, Secand, when Ubi 
results ol several Inwesti@alics disagree with each other, a problem is manitestecdt, Third, when 
ihe facets inany field are found in terns of unexplained information, a problem is said to exist. The 
staterment of a research problen is often a difficult task for the researcher, Sometimes he hay only 
a diffused and even confused idea regarding the problern, Before selecting and formulating. a 
prol Hem, he rust know what is meant by 4 researc h problem Of al proablorn. A problen Shalement 
may be defined as an interrogative testable statement which expresses the relationship byaiiuabaiculs 
hwo ormore than twe variables. Analyzing this simple definition ofa problern statement, ean bya: 


said that there are three important characteristics ofa problem statement: 
I. A problem Satement 4 written clearly and unambiguously, usually in question formeiA 


her Cripple te) | prablerns PT Vall 'y be: 
Whats the relationship betweer I and classroom achieverncntt 
What the relationships Between araiely and intelligence aarcarg st Hout children? 
Heo students bean more trom a demarcate teacher than fran an AUT Oritariary tea her? 
All the above examples shedler at problem (tei bias ierwerst op atered) in questi far 
Research questions, whether they are prespeciied or developed during the research 
wark/proyect, are central and serve the following functions 
f) They onanize the project and provide it certain direction and coherence 


a5! 








' Rehariounal Sciences 
nn and Reavench Merhowds it 
462 Trex Meexourretr 


The wrt rect and d : lay its bound: es, 

ye delimit the propes isp t a : | 
They keep the researcher focused and altentive du I EB (he projec t. 
fom ay = re en nn h ‘ 


lv) They also point about the data which will be needed. | 

iv) They provide a general framework for writing up the project. 

Researchers have revealed some characteristics of good and bad research QUEStions. xa 
important characteristics of a good research question are as under: bs 


i) Agood research question is clear: A good research question is unambiguous and nasil 


understood. | 
(ii) A good research question is specific. A good research question POssesses such Conce, 
which are specific to the extent that they are connected to data indicators. 


(il) A good research question 1s answerable: A good research question clearly shows tha 


what data are required to answer therm and how the data will be obtained. 


(iv) A good research question is interconnected. Good research questions are rel te 


each other in a meaningtul way. They are in no way unconnected. 


(v) A good research question is substantively relevant A good research QUESTION js an 


interesting and worthwhile question for the research effort. 


Bad questions, on the other hand, are questions which fail to satisfy one or more of the above 
‘ 


criteria. Mostly this happens because such questions are unclear and not specific. 
2. A problem statement expresses the relationship between two or more than two va 


This kind of problem, which expresses the relationship between two or more Variables, permits. 
the investigator to manipulate one of these variables and examine its effect upon the other 


vanable. Such 2 problem is completely different from a problem in a descriptive study where the 


invesiigalor Cannot manipulate a variable, rather he simply observes or Counts the OCCUTTENCE of 


4 particular variable, For exarnple, if the investigator puts the question, “How many students have 


manipulate, but sirmply to count the students having an IQ higher than 110, This illustrates the 
problern fonsrid in 4 descriptive study, Bul suppose he puts the problem like this: “Is higher iQ 
related to Classroom achievernent?”, it illustrates the problem found in a scientific study because 
the problem statement explores the relationship between IQ and classroom achievernent. 
J. Avprablem statement should be testatile by empirical methods. In other words, a problem 
satement should be such that it should be tested through the collection uf data. A problem 
datement in which the stated relationship between variables cannot be tested is not a 
iertitec problem, 4 


an iQ above 110 in Class Vil?", it will be an example at a problem in which he is required ~ " | 


Apan from these characteriotic » there are three additional + haracteristics of a. 


problern staternent. 
F et, & problem datemem should avoid moral or ethical judgements simply because such 
aalemenms are very difficult to study, For exarnple, questions like the fallawing should be 


Peron not tell a hie in all citations? 


aveided: Should a widew remarry? Should children avoid + healing in all situations? Should 


Second, the problem Should be of suffic ivr importance. In other words. the problem hockie 


f " ‘tirruls ——— al . 
na aba —s 4 foo japensive in tenn of time. money arid effort, otherwise the investigalor 
BNE BORE NG TONS arry anwwver immediately or even in the future 
Quite useless because it canton be testers etentifically, a too general or broad problem is 
side-on it rrisst be spe thie | But a malities oe the problem must be reduced to 4a workable 
SUI NO to be town Specitic ber ause “great speciticity is 


pear a weve danger thar tr 
! re : 10 YS eal poreral ity” , Wy 
therelore, a trivial problern, a ality” (Ker linger 1975, 24), A tog specitic problem i, 













The Pde pred thee ‘ft diet = 4 


SOURCES OF STATING A RESEARCH PROBLEM 
stating 4 suitable problem is always a difficult task ine the researc her Tes enhve the Gitta uity. ar 


gsential that the researcher acknowledges the likely sources te, which he may BP teow loerrayl at ony 
suitable research problem. These sources are as tollows 
al’ 


1. A careful examination of problems which are bemg faced ty teachers. chudent 

guardians may throw light upon several topics that are worth InVveSIgating Foe ewarrsple stuchert 
unrest is one of the most important problems which school and college teachers are tac wg, tockey 
Thoughtful deliberations may enlighten several problems worth OWE get eg wha Bee rol 
students participate in unrest? In which age group is the unrest most oty, woos? To what entent are 
reachers and guardians responsible tor student unrest? 


2, Those who are looking for a suitable Problem should consult textinooks anc ree ere b 
journals. Many research articles suggest problems for turther investygation. Sst only thes. they 
also demonstrate the use of different techniques and procedures to be tollawed in onew mating a 


certain problem, 

3. Problem seekers may also consult research professors, unstructors and acy nor. whe. ao 
regarded as the most competent persons tor SUBpESIINg a suitable problem 

4. Social changes and educational innovations constantly bning torth mew problems whech 
are worth investigating. These new problems pave the way for new ¢ sa meade aes 
researches. The problem seeker may lind a suitable research problem grown aut of the varous 
social Changes and educational innovations. 

Best & Kahn (1998) have outlined several different sources of locating a research problenn 
Among these, the important ones may be outlined as mentioned below 


(1) Programmed instruction 
(2) Television instruction 
($) Team teaching 
(4) Homework policies and practices 
(5) Field trips 
'}) Extracurricular programmes 
7) Student's out-of-school activities: employment: recreation: cultural actevity reading and 
television viewing 
4) Teacher's out-of-school activities: ernployment; political actraty and recreation 
'9) Cipen classroom 
M1) Multiple textbooks 
V1) Independent study programmes 
(12) Educational organizations 
1d) In-science programmes 
(14) Sex education 
(1S) Nonwe hool-sponsored social organizations ot clubs 
(16) Attribution of success and failure 
17) Special education 
N18) Comparison of the effectiveness of two leaching methods procedures 
19) Factors associated with selection of leac hing/nursing/ social work a6 a Career 
(20) © ose studies 
(21) Socio-economic studies and academic achievernent 


122) Administrative leadership 





Spariuaral Sctences 
researc? verbods Wi Behavioural 
ap renle ts cand ‘ates 
qurenit ne 


Tests. Vow 


444 


ac h ieveme rt 


analysis 

objectives of stude 
Se aed 
ganic ace the gene 

, above topics 4 : de 
waccvoenee sarily be done In orde 


. “ess i , i 

sg i wire fic i hree Ma OF eneral sources ot locatin 

Young (1992) has suggested the following 1" ened 6 a resear 
c B tad - : 


problem In the field of behavioural sclences- 


(1) Documentary sources: These source ie 
ocal newspaper accounts, Census 


are easily derived. _ . | 
(2) Personal sources: These sources include scapotomat) of some eerie Who haye 
right Lenawledge and insight into the data desired as well as a cpa of those persons Whe 
are directly involved in the commission of the concerned acl or be haviour. 
(3) Library sources: Such sources include study of various textbooks, journals, monograph 


and newspaper analysis. 
These sources SUPP 
accumulating pertinent data reg 
Obviously, then, there ar 
finally traced. 


IMPORTANT CONSIDERATIONS IN SELECTING A RESEARCH PROBLEM 
Before the selected research problem Is considered appropriate, and before the actual 
investigation starts, several questions should be raised, When the questions are positively 
answered, we can regard the problem as a good scientific problem and investigation can 
be started. 

1. Is the problem significant? Are there enough variables to be investigated? Will the 
solution of the problem make a significant difference in current psychological, educational and 
social theories or practices? 


(23) Stress and 
(24) Selt-image 


| nts 
(25) Vocational 


<ources of locating 4 problem. Therefore, a great 


-to carve out a research problem from them, og 


s include the analysis and inspection of officia) 
| statistics, | publications from which different vn 
unofficial statistics, 

descriptive materials 


ly both practical and theoretical knowledge and help a lot j, 
arding formulation of a resea rch problem. 


e different sources from which a research problem can be 


2. Can the problem be solved by the process of research? Is the problem such that relevant 
data can be gathered and an answer to the problem be found? 


3. Is the problem a new one? If the problem is such that it has already been investigated, a 
researcher will fruitlessly spend time and money in replicating the study. Therefore, the roblent 
should be an original and novel one, although there are times when it Nene necoul to 
replicate a previous study so that the conclusion can be verified or the find : : 
generalized to a broader situation, nana: 


4. Has the problem theoretic: | 

2OTre ical value? Will the probl T il i l 
' em fill the dp in th lite: é ill i 
contribute to the development of a theory? - ci 


Le = | 
5. Isthe problem workable? A problem may be a good one but it may not suit the researcher. 


The researcher 2 . ; 
‘her should get the answers to the following questions in the affirmative: Am | 
; Se PMalive. 


complete the research? Will | hi 
ave adequate financial resources to complete the research? Willl 


=a OOp sfalion of su bjects OO af { 
nM h } = MM the proposed researc h probl am é 


6. Is the problem s . 

; am such that pe 8 

be | that pertinent data is made available? Are 

atearneene availabe? dé available? Are good data-collecting 
A problem is good and researchable er 


| ive esearcher 5 
the affirmative, archer pets answers to the above questions in 


the Problem and the Hypothess 955 


ys IN WHICH A RESEARCH PROBLEM IS MANIFESTED 
et am ig said ta exist when we know that there is something we don't know really. There are 
hich a problem Is said to be manifested, 


a 
least three ways In W"™ , toca! 
at oticeable gap in the results of investigations 


(4) An 
(2) Con 
(3) Isola 


tracictory results of investigations 
ted fact in the form of unexplained information 


These three may be discussed as given below. 
(1) A noticeable gap in the results of investigations: A problem is manifested when there isa 
--eable gap OF absence of information. Suppose a community oF group intends to provide 
yer therapeutic services. Here, two questions arise: Firstly, what kind of psychotherapy they 
Z an ofier and secondly, which one of the different forms of therapeutic methods 15 most 
shoul for a given type of mental disease. These questions are highly important, but there are 
Se ew scientific studies which provide answers to these questions. Therefore, a noticeable 
abe the knowledge exists. Hence, there is the need for the collection of necessary data and 


their explanation for filling the gap. 

yal Contradictory results of the investigations: When several investigations done in the same 
rand aa let a eaeer pach itis contradictory, a problem to find out a new 
— and settle the controversy © easily manifested. It we review those experiments that have 
been conducted to examine the effect of reinforcement in learning, there are some experiments 
which confirm the fact that learning can’t take place without reinforcement whereas there are 
some experiments which have vividly confirmed the fact that learning can take place even in the 
absence of reinforcement. Such a contradictory situation presents a problem. The aaamatare 
immediately want to know: Why does reinforcement affect learning in one situation but not in 
others? What are those situations in which reinforcement affects learning and those situations in 
which reinforcement does not affect learning? Likewise, it Sia io 
areal! which basically the same problem has been tried to be answered: What is the effect 
SEE la Oe oe nee learning a task? Three independent and contradictory results 
have been obtained. One type of experiments has shown that progressively increasing rest 
periods are better for learning, another type of experiments hat revealed that progressively 
decreasing rest periods are better for learning while the third type of experiments has revealed 
that effects of progressively increasing or progressivty decreasing rest periods are about the 
same. Now, in such a situation the question arises: Why do these three types of experiments 
provide us with conflicting results? In an attempt to locate the possible reasons, set ces ace 
may think of extraneous variables which might have been leit uncontrolled and therefore, there 
were differences in the results of the experiment, So, the problem here arises: How to locate 
those extraneous variables, what their possible solutions can be, and so on. 

(3) Isolated fact in the form of unexplained information: When the researcher possesses a 
fact and at the same time is interested in seeking the answer fo the question, why this 1s so, he 
becomes aware of the problem. Putting it in other words, a fact which exists in isolation from the 
rest of our knowledge, demands an explanation. In fact, anything in itself presents no problem, 
but if it does not fit in with the existing knowledge, a problem Is said to be manifested. In such a 
situation, collection of new information becomes necessary 50 hat aventually the oni 
expect that the new fact will be related to the additional information in such a way that it ser 
adequately explained. This can be ‘Illustrated from Max Wertheimer's discovery af ite 
phi-phenomenon which refers to the perception of a particular Vandot movement inv bendaen 
stimuli that are actually stationary. It is said that Wertheimer, while sitting sa cunniiigeaiait) Baie 
movement from one telephone pole to another instead of seeing mere stationary telephone poles, 


Ke ie ofa Hionrad Scenes 
456 Tests. Measurements and Kesea" h Metbods in Be 

one and then snother. Later on, Werther systematically studieg 
—using Kohler and Kottka, who later eae important Psychologig 
ychologists, that 15, Wertheimer, Kohler and Koffka held hat 
phi-phenomenon could not satisfactorily be explait red = - sii theor les in Psychol. 
and, therefore, they adva need a new theory giving sedis : ial ete nown as Gesta} Thee 
in psychology. The essence Ol their viewpoints Was "bea iy could p ' 
understood by analyzing the situation into parts, rather, it could be understood only by hing 


the whole complex situation. | 
Thus, there are three primary ways in which the research problem is manifested, A rese 
student must have, by now, adequate knowledge about these ways of manifestation, arch 


TYPES OF RESEARCH PROBLEMS 


broadly be categorized into two lypes. 


that is, first 
phenomenon 
subjects. All three ps 


Research problems can 
(a) Solvable problem 


(b) Unsolvable problem 
A solvable problem is one that raises a question which can be answered with the use ofou 
| [ 


normal capacities. An unsolvable problem is one that raises questions which can’t be answered 
with our normal capacities. Unsolvable problems are usually concerned with supematural 
phenomena. ‘Does reward facilitate learning?’ A problem like this is an example of a solvable 
problem. ‘How does our mind work?’, ‘Is there a divinity that shapes our ends?’—these af 
examples of unsolvable problems, The chief distinction between a solvable problem and af 
unsolvable problem is that solvable problems can empirically be solved by studying observable 
events, whereas unsolvable problems can't be empirically studied. ‘ 
Science addresses itself to only solvable problems. A problem is said to be solvable if itjs 
possible to advance a suitable hypothesis as a tentative solution to it. A solvable problem, among 


other things, has the following two features: 
(i) A solvable problem can be solved by empirical methods. An empirical method is one 


which depends upon observation of natural events. 
(ii) In case of a solvable problem, it is possible to advance a suitable hypothesis as a 


tentative solution to it. 

An unsolvable problem has three characteristics which help the researcher in recognizing it 
and making it distinct from a solvable problem. These characteristics are as tollows: 

(i) An unsolvable problem possesses the trait of unstructuredness. Such a problem remains 
unstructured because the interest of the researcher is not clear and the area it refers to Is 0 
unorganized that it no longer remains possible for the researcher either to specify the relevant 
seen or to relate the observations to these vague formulations. For example, ‘Can human 
nature be changed?’ is an example of such al : l | ‘cit : 

fis ane V unsolvable problem in which suffici | 
| | icient degree 
unstructuredness exists. i 
- ‘ ae problem has inadequately defined terms and the operational definition of 
; P 0 id ontains vagueness, and its variables are such that can’t be defined operationally. 
1e above example, human nature is vapue and it can’t be defined ¢ l 
ene: een 't be detined operationally. 
) The third characteristic of an unsoly; } ) 
sali: Settiiis a peobdeen ache em problem is that relevant data can’t be 
“3 structured and its terms are operation; i 
researcher can't collect relevant data. 2tms are operationally defined, yet the 
These characteristics are suc 
| are such that an unsol 
a solvable problem able problem can readily be distinguished from 


fhe Prwilem ane the Pb yyocathvee i, @57 


mpORTANCE OF FORMULATING A RESEARCH PROBLEM 
The formulation of a research problem is one of the first and significant steps in any research 
pocke? Liainios ne lca ve ae problem is like the identification of , destination 
pefore starting a JOUmnsy" As in absence of identification of destination, the researcher cannot start 
a journey by the — route, sO also in absence of formulating a good research problem; he 
cannot undertake a clear and defined steps of the research process. 

A good research problem is like a good foundation of a building. As we know, the strength of 
building depends upon the design and solid foundation, so also a good research sad its 
outcome Is depenicient upon the formulation of a good research problem. Kerlinger 11986) has 
rightly opined thal if the researcher really wants to solve a problem, he must brinwy what: the 
research problem is. He must have a clear idea with regard to what it is that he wants to find out 
and not only what he thinks he must find. 

is research problem may be simple or complex. The way the researcher formulates the 
problem Gelernnnise be tere to be followed in research: the type of variables to be included, the 
type of the research design that can be used, the type of sarnpling strategy to be employed, the 

d as well as the type of analysis that can be undertaken. In simple words, 11 


tests/sca les to be use 
be said that the formulation of problem is like the input into the research study and the output 


can 
is the cause-and-etfect relationship or association established. If the input is good, the output is 
bound to be good. On the other hand, if the input is vague and questionable, the output 1s bound 


be questionable and unbelievable. Thus the famous saying about computers—garbage in, 


lo 
out (GIGO)—is also applied to a research problem, 


garbage 
STEPS IN FORMULATING A RESEARCH PROBLEM 


a research problem is one very important part of a research process. It consists of a 


Formulating 
r must know. These steps are described below, 


number of steps, Which a student or the researche 

|. Identifying the broad field or area of interest: The first step in formulating a research 
problem is to identify a broad field or subject that is of interest to the researcher, For example, it 
searcher is interested in studying the treatment of abnormal behaviour of the person, he 
uct research on a subject area in the field of clinical psychology. If he is interested in 
form impression about others, he may select a research topic from social psychology. 


| in studying supervisory behaviour, he may undertake or 
areas, 


the re 
may cond 
how people 
Likewise, if the researcher Is interestec 
explore the field of industrial/organizational psychology. These are the broad research 
which are of interest to the researcher. 
The next step in formulating a research 
all subareas. For example, the field of 
the persons, 


2. Dissecting the broad area into subareas: 


yroblem is to dissect or divide the broad area into sm 
into small subareas like mental health of 


tardation, types of phobic behaviour, etc. While 
also have consultations with others who 


[ 


clinical psychology may be dissected 
stressful behaviour, causative factors of mental re 
the researcher is preparing the list of subareas, he should 
are experts and know the subject area well. 

3. Selecting any subarea that is of most interest [0 the researcher: For any researcher, it ts 
4s. Therefore, the researcher selects subareas that are of interest fo 
him. Besides the interest of the researcher, there are many other considerations in selecting a 
problem and these have already been discussed. One easy way to select a subarea of interest is to 
decide through the process of elimination. The researcher goes through the list and deletes those 


subareas, which are of little interest. Through this process, gradually towards the end, he finds 
‘nterest and which he considers manageable, and 


not feasible to study all subare 


that he is left with the subarea, which is of high 


458 Tests Measurements and Research Methods in Behavioural Sciences 


that matches his level of expertise and related resources. Thus a subarea of the 
“Search ; 
a 


demarcated. 
4. Raising research questions. After selecting the subarea, the researcher rajcec 
questions. At this step, the researcher raises whatever questions he wants to answer i. .” 
subarea. If too many questions are raised, again through the process of elimination, he Be 
some manageable questions and eliminates other questions. ICks ty 
5. Formulating objectives: At this stage, the researcher formulates his main obj 
and subobjectives of the research study, In fact, objectives grow out of the research a Nii 
There is difference between objectives and research questions. Research questions are q a 
written in interrogative staternents whereas objectives are something, which results 
transformation of those questions by using, some action-oriented words such as ‘to find out’ 
explore’, ‘to determine’, ‘lo examine’, etc. Thus objectives are derived from questians os 
researchers keep themselves satisfied with the research questions only and they don't formy 
objectives, Besides this, there are researchers who proceed in reverse order, that is = 
formulate objectives first and then, formulate the research questions. dhe. 
6, Assessing the objectives: At this stage, the researcher examines the feasibility 
achieving those objectives in light of his available financial and human resources as well ¢ 
terms of technical expertise available. be 
7. Doing double-check: This is a final step in formulating a research problem where the 
researcher goes back and rethinks whether or not he is really interested in the objective of the 
study and have sufficient and adequate step to undertake it. At this stage, the researcher usually 
asks questions like this: Do | have enough resources to undertake this study? Am I really 
interested and enthusiastic in undertaking this study? If the answer to any such question is ‘no’, 
the researcher tries to reassess his objectives. | “ 
| Thus in formulating a research problem, a researcher has to go through these various steps 
sincerely and enthusiastically. = 


MEANING AND CHARACTERISTICS OF A GOOD HYPOTHESIS 


In conducting research the next step after the selection of the problem is to formulate a hypothesis 
(pl. hypotheses), What is meant by a hypothesis? As we know, any scientific investigation starts 
with the staternent of a solvable problem. When the problem has been stated, a tentative solution, 
in the form of a testable proposition is offered by the investigator. This testable proposition is 
called a hypothesis. Therefore, a hypothesis is nothing but a suppested testable answer to a 
problem. Enlarging on this meaning of a hypothesis, we may say: A hypothesis is a testable 


relationship between two or more than two variables, Several experts have defined hypothesis in” 


the same way. For example, the viewpoints of two leading experts can be quoted: McGuigan 
(1990, 370) has defined hypothesis as, “a testable statement of a potential relationship between 
two lor more) variables, that is advanced as potential solution to the problem ” Kerlinger (1973 
18) has defined hypothesis as, “a conjectural statement of the relation placa two or mal 
variables, Hypotheses are always in declarative sentence form, and they relate sitier 1enerally 
or specifically, variables to variables.” On the basis of these definitions ie ile can be 
ggested about a hypothesis. First, a hypothesis is a testable salaenges, abil a ans that it 
_— Sirus between those variables which are measurable iF nemeeee 
Hboresen ; e. Second, a hypothesis exhibits either a general or specific relationship between 
The above interpretation shows 

Ra Smee ee SI i or characteristics regarding hypotheses are 
impression that problems and hypotheses are ident ger istered produce “ 

al, In tact, there are some basic distinctions 


The Problem and the Hypothests 459 


en a problem and a hypothesis. First, hypotheses are testable whereas problems are not 

- example. what is the relationship between IQ and classroom achievements?” is a problem 
which cannot be scram when we deduce a hypothesis from this problem in the light of 
ie available facts andt reories, we can test it. Thus the hypothesis, “IQ is positively related with 
m achievement” can be tested directly. Second, problems are stated in the form of 

entences whereas hypotheses are stated in the form of declarative sentences. 

; indicates what we are looking for. It is a proposition, which always looks 

the hypothesis, the investigator may find it to be correct or incorrect. It it 
problem (stated earlier) is solved and if it proves to be incorrect, the 


classroe 
interragatlve ® 
A hypothes! 
;orward- After testing 
roves to be correct, the 
coblem is not solved. 
When a hy pothesis has been formulated, the investigator must determine whether or not the 
formul sted hypothesis is good. There are several criteria or characteristics of a good research 
hypathes!s. A good hypothesis 1s one which meets such criteria or incorporates such 
characterist arge extent. Some of these characteristics are enumerated below. 


1 The hypothesss should be conceptually clear. A good research hypothesis is one which ts 
based upon operationally defined concepts. Not only this, the definitions must be given in 
monly-accepted and communicable words so that the complete hypothesis is conceptually 


estigator 


ics taal 


com | 
clear to any Inv 

2. The hypothesis must he testable: it should be formulated in a way that can be tested 
found to be probably true or probably false. A hypothesis like, “Democratic 
ducational institution improves the creative thinking of the pupils,” is an 
which is too broad and hence, cannot be tested directly. Such a 


directly and 
atmosphere in an © 
example of a hypothesis, 


hyp sthesis, however, 15 potentially testable. 
arsimonious: Mt several hypotheses are 


3. The hypothesis should be economical and p 
ered to test 4 research problem, the more economical and parsimonious ones should be 


hypotheses involving higher monetary expenses. 
4. The hypothesis should be related to the existing body of theory and fact: Ii the investigator 
advances a hypothesis, which seems to him of interest but which is not related to the existing 
body of theory of facts, it cannot be a good research hypothesis. If, for example, the investigator 
develops a hypothesis that skin colour produces a difference in intelligence then it may be an 
interesting hypothesis but may not be scientifically sound because no theory has proved so. 
hesis should have logical unity and comprehensiveness: It several hypotheses 
problem, the most logical and comprehensive 


oft 
preferred to 


5. lhe h ypot 
can be formulated regarding the same research 
one should be preferred. 


6. The hypothesis should be general in scope. A general hypothesis permits several 


deductions and thus, explains several facts at a time, Theretore, a peneral hypothesis should be 
preierred. However, very broad or very general hypotheses cannot he good research hypatheses 
because such hypotheses are often vague and cannot be tested. 

wuld be related to available scien 
a cannot be collected because no 5 


7, The hypothesis she tific tools and rium A 
hypothesis about which dat ientific tools or techniques are 


available, cannot be a good research hypothesis. | 
rd with other hypotheses of the same field: While this 15 


is satisfies this criterion, itcan he claimed to be a good 
dicts other hypotheses of the same field can also be 
wed by scientific rationale, which in tun, has 


B. The hypothesis should be in acco 
not an essential condition, if any hypothes 
research hypothesis. A hypothesis that contra 
regarded as a good hypothesis provided itis follo 
experimental support. 





Mm operey AC hes 
saencranements red REE inch Merbowds tn Bebe vonnal Scren 
460 Tiss, Meas 


: NC POTHESIS 

MULATING A HY . | See 

waa ficult to tell precisely how a scientist formulates a hypothesis because the Process 
It is attic ' siladaaaca aia 3 


lation itself is vague and idiosyncratic. Goode & Hatt (1952) have pointed out three : 
formulation tt - in formulation of a good research hypothesis. First, the absence of knowled, 
ework is a major difficulty in formulating a good research hypothesis. 
able or if the investigator is not aware Of the 


possible difficu ltie 
of a theoretical tram 


detailed theoretical evidences are not avail : 
of those theoretical evidences, a research hypothesis cannot be formulated, Se, 


availability or 
dge of the theoretical framewoy 1 


when the investigator lacks the ability to utilize ihe bowie , 
hypothesis cannot be formulated. Third, when the investigator is not aware of the iMportan; 
scientific research techniques, he will not be able to trame a good research hypothesis. 

Despite these difficulties, the investigator attempts in his research to formulate a hypothesis 
Usually the hypothesis is derived from the problem statement. The hypothesis should be 
formulated in a positive and substantive form before data is collected, In some cases additiona| 
hypotheses may be formulated after data has been collected, but they should be tested on a new 
set of data and not on the old set which has suggested it. The formulation of a hypothesis isa 
creative task and involves a lot of thinking, imagination and the like. Reichenbach (1938) has 
made a distinction between the two processes found commonly in any hypothesis-formulation 
task. One is the context of discovery and another is the content of justification. The manner or the 
process through which a scientist arrives at a hypothesis illustrates the context of d iscovery, and 
the presentation of evidence or proof in support of the truth of the hypothesis illustrates the 
context of justification. A scientist is concerned more with the context of justification in the 
development of a hypothesis. He never puts his ideas or thoughts as they nakedly occur in the 
formulation of a hypothesis. Rather, he logically reconstructs his ideas and thoughts and draws 
some justifiable inferences from those ideas and thoughts. He never cares to relate how he 
actually arrived at a hypothesis. He does not say, for example, that while he was shaving, this 

particular hypothesis occurred to him. He, usually, arrives at a hypothesis by the rational 
reconstruction of thoughts. When a scientist reconstructs his thou ghts and communicates them 
in the form of a hypothesis to others, he uses the context of justification. When he arrives ata 
hypothesis, he extensively as well as intensively surveys a mass of data, abstracts them, tries to 
find out similarities among the abstracted data and finally makes a generalization or deduces a 
proposition in the form of a hypothesis, | 


for example, consider the following situation in Pavlovian conditioning. A bell is sounded 
and immediately after that, meat powder is presented to a dog. The dog starts salivating. After a 
number of presentations ot the bell and food. the dog salivates at the mere presentation of the 
sound of the bell. In such a situation the experimenter observes that all the instances of salivation 
made in each trial are similar to each other and hence, they are shevaniean him = belonging to 
one general class of the salivation response. Likewise, the sound of the bell 5 uate in ane 
eneral class because all the sounds of the bell are similar enough 
entist uses Classification for distributing a mass of data into a smaller 
°$ $0 that they can be effectively handled. Next, he tries to find outthe 
n the classified data so that a hypothesis can be formulated. For example, the 


experimenter may now forry ‘the Af 
i —. ‘ fmulate the hypothesis: After so Many repetitions of the bell and food, 
| ‘Muarly respond to the mere sound of the bell 


The above example illustrates the lypic 
behavioural scientist, 


is seen by him as forming one Bene 
to form a class. Thus the sci 
number of cateporie 
relationship betwee 


al processes through which a scientist, at least a 


| Proceeds before arriving at - | 
sClentists who proceed in a haph HAVING ata hypothesis. However, there is no dearth of 
Na haphazard way in formulating a hypothesis = 


fhe Problem ane the Hivynthai 461 


ays OF STATING A HYPOTHESIS 
Ww A testable statement of the relationship hetween the variable 


gS 1S iy, ' 
othests Sunder study. Since it 
AbyP +» ean be shown whether a hynothes 
i ~ £ = ry ry or , 
” restable, il ri an ¥f “5.15 1 # ie F tripe Cor fa lS When we dre aware fay | 


- characterisucs of a hypothesis, we must know how to state a hypothesis 


hould be stated in the logical form of the peneral mplication. A nypothesis inform ote general 


implicalt 
words, iL 


thes 


on may be expressed in terms of an “If... then...,” relationship or “If a then &*. In other 

-4 condition holds true then some other canditions also hele true, For example, if the 

reward 15 given for learning a task then the learning is improved. in general rrplic aticni, the a 

condition is referred to as the antecedent condition and the b condition i< feferted te as te 

-onsequent condition, The antecedent as well as the consequent Condition is expressed in the 
form ot propositions and therefore, the hypothesis expresses the relationship between the above 
two types ot propositional statements. In stating the hypothesis as general implications, there 
occur two common misunderstandings regarding the antecedent condition and the c NSeuUENt 
condition. First, it Is generally misunderstood that the antecedent condition causes the 
consequent condition, This is not always true. The general implication only says that if the 
hypothesis is probably true then the antecedent condition and the consequent condition will 
occur together and not that one causes the other. Second, it is commanly misunderstood that the 
consequent conditions are always true. But in reality, the general implication does not say $0. It 
only states that if the antecedent conditions are true then the consequent Conditions are also true. 

Now the question ts: Are we following Russell's standard general implication in stating the 
hypothesis? The answer is probably ‘no’. Recent publications of scientists have clearly shown 
that they are not following the “It the hypothesis in the torm of a statement. which is direct and 
does not contain the “It... then..." torm. For example: 

(i) Reward improves learning. 

(ii) The onset of fatigue reduces the efficiency of workers. 

li we are not conforming to the advice of Russell, we are not necessarily committing a 
serious error in stating the hypothesis because the above type of hypathesis can be restated in the 
“it ..., then...” form of Russell also. For example, “Ita reward is given, learning is improved” and 
“if fatigue occurs, the efficiency ot the worker is reduced.” 

It is obvious that the hypothesis can be tormulated in ditterent ways. 


TYPES OF HYPOTHESES 
On the basis of the degree of generality, research hypothesis can be divided into two types. 

(a) Universal hypothesis 

(b) Existential hypothesis 

Auniversal hypothesis is one in which the stated relationship holds good tor all the levels or 
Values of variables which are specified for all time at all places. “Adequate level of light increases 
reading efficieney’ is an example ot universal hypothesis, 





Existential hypothesis is one which states that the re ationship stated holds good for at least 
one particular case. For example, ‘There ts at least ane s¢ hizophrenic who does not have either 


delusion or hallucination’ is an example of existential hypothesis. 


Cl these two types ol hy potheses, the wn ersal hypothesis iS preterred because such a 


hypothesis has a vreater predicative power than the existential hypothesis. 
ed expectation about a behaviour that detines the 


AS Wwe know, hypothesis nal formally stat a ial 
ased upon the goals ot explaining and 


Purpose and goals of the study being conducted. Base } 
SOMtOoling the causes of behaviour, there are two types ol hypothests, 


rheuds ohaiourel SCHNCES 
, anh Metbads in Bebaviourndr, 
Mi amenty and Researcd oe 
462 Tess. Mowsun 


1. Causal hypothesis 
>» Descriptive hypothests | | 
: ; hppa A causal hypothesis postulates a particular causal inflyen. 
behaviour. In other words, it tentatively explains a particular influence on, OF a Cause a 
cular behaviout For example, if the researcher hypothesizes that boring contents. 
ser advertisements is the cause ot channel changing by TV viewers it : 
oui le of causal hypothesis. Although it is a fact that boring contents may not be t 
penal Nes ol the channel-changing behaviour, it is the probable cause we are inye 
at the moment. : 
> Descriptive hypothesis: Descriptive hypothesis is one that postulates parigg. 








- 3 
x q 

=r ; 

— ie 

ag ph 
= 4 

fF ia 

te " 


a it 
. peer 


characteristics of a behaviour or provides some specific goal for the observation. In tact, sc 
hypothesis tentatively describes a behaviour in terms of the characteristics or the situation in 
which it occurs. Such hypothesis identities the various characteristics or attributes at beh: Hiduy 
and allows us to predict when it occurs. For example, the researcher hypothesizes that Channel 

changing during TV viewing occurs more frequently when the person is alone than when heis. 
watching with others. The reality may be that even the number of people present might partially. 
cause channel changing, and the researcher has not stated that. in this way, it. can be said that a 
descriptive hypothesis simply describes the behaviour in terms of the various characteristics of 
the situation and it does not attempt to identify the causes of a behaviour. 

Apart from these, the other type of hypotheses that we commonly use in behavioural. 
researches are simple hypothesis, complex hypothesis, research hypothesis, altemative 
hypothesis, null hypothesis and statistical hypothesis, These may be described as under, i 

1, Simple hypothesis: Simple hypothesis contains only one or two variables. For example, 
hypotheses like children from broken homes tend to become delinquent, reward improves 
learning, aggression is associated with frustration are all examples of simple hypotheses, In all. 
these hypotheses, the relationship between only two variables have been postulated. Heneg, 
they are examples of simple hypotheses. 

2. Complex hypothesis: Complex hypotheses are hypotheses which contain more than two” 
variables and therefore, require complex statistical calculation too, Such hypotheses are called, 
complex because the interrelatedness of more than two variables acting simultaneously is mor: 
difficult to assess quantitatively and theoretically. A hypothesis like children from upper and: 
lower socio-economic status have larger adult adjustment problems than children trom middle: 
socio-economic status is an example of a relatively complex hypotheses. 





3. Research hypothesis: A hypothesis detived trom the researcher's theory about some 
aspects of behaviour is called a research hypothesis or is also known as a working hypothesis, 
The researcher believes that his research hypotheses are true or thot they are accurate statements 
about the conditions of things he js in vestigating, He also believes that these hypotheses are tue 
to the extent that the theory from which they were derived is adequate. In this perspective, Siegel 


and Castellan (1988) have defined research hypothesis as, “the prediction derived from the 
theory under test.” 


| 4. Alternative hypothesis; The tormulation of an alternative hypothesis (H,,)is acanvention 
in scientitic research, Alternative hypothesis is defined as the operational statement of the 
investigator's research hypothesis, It functions as an alternative to 
asserts that the independent variable has an influenc 
hypothesis is rejected by the results of the experiment, 
research hypothesis is strengthened, Alternative 
experimental hypothesis or statistical hypothesis. 


null hypothesis and typically 
e upon dependent variable, If null 


7 " 


hypothesis is sometimes also called as 


5. Null hypothesis: A null h eth yi , | 
ie ft nett: hypothesis (H,)is, ina sense, the reverse of a research hypothesis. It 
is, in fact, a no-effect or diffe reverse of a research hypothes 


rence hypothesis or negation hypothesis that tends to refuse or deny 


alternative hypothesis is accepted and the ' 





The Probie and the fyi 3 


hat is explicitly indicated in a given research hypothesis. Generally. the expenmenter or 
what © 


-earcher’s aim Is 10 refute this hypothesis on the basis of the obtained results so that its reverse 

ae? the research hypothesis can be supported or confirmed. 
tha is interrelatedness ot research hypothesis, null hypothesis and alternative. or statistical 
hypothesis can be explained through an example: Suppose a certain socral-psychological theory 

a ld lead us to predict that two specitied groups of people would differ on the measure ot 
. Ai ence, This prediction would be our research hypothesis which would state that the two 
sn : differ. Confirmation of this hypothesis would lend support to the theory from which it was 
Saal To test this research hypothesis, we state it in an operational form as the altematie 
hypothesis, that is, H,. One operational way to state this alternative hypothesis would be that 

‘ean intelligence scores of these two Groups differ or are unequal. The nul! hypothesis (H 
feet? be that the mean intelligence score of the two groups is the same. It the data permits us to 
wise H, then H, would be accepted because the data support the research hypothesis and its 
underlying theory. 
| In fact, the nature ot the research hypothesis determines how the altemative hvpothes:s (H 
should be stated. If the research hypothesis simply states that two SrOUps W it ditter with respect 
mean then the alternative hypothesis would be simply that mean of the two gh pups are sail 
But if the research hypothesis predicts difference with direction, thats. one specihed group will 
have a larger mean than the other then the altemative hypothesis mav be thal the mean or group | 
is greater Ol less than the mean forgroup 2. 


SOURCES OF HYPOTHESES 
iy social science researches, a researcher generates a hypothesis trom several sources, Some ol 
these Important sources are as follows: | 

1. One primary source of generating a hypothesis is vaneus OpINIONS obey dhans and 
experiences. A researcher bases his hypothesis on these sources 8) long as he conducts an 
empirical and objective study to provide evidence for testing the hy puthess | 

» Another source of a hypothesis is retesting ot a hypothesis previously tested by suunte 
other researcher. 

+ A third source of generating a hypothesis is Wie existing reatvunch itvelt, A talented 
researcher, hile rending about the results ofa repated study hich tested ate OF WS elie 
hi potheses, often generates several additional hy pratherse's, Fi ee. . et = i 
that positive reinforcement improves earning, he voila the Tike fo etentine WY NT ENTS 
also try to identity the factors that modify this influence’. 

4. Atourth source of a hy pothesis is (hey, Athy as ive Know isa kage dlls aT 


7 oaabantran behav nus, In 
é see ate aoe AN TMD TEMS CHIT ATRCN Becta AORIE TTEITY 
wroposals that defines, explains and organizes ¢ a 
OF such propos. t to a baad tatye of bebavroges itd 


, , es abstract concept that applies . 
Eset a themny Wnecuaes ES ‘ vrata esate a VST ATTA ERT 


parsimonious way, Forexample, Fread\an they tread fo ox 
and abnormal behaviours using several abstr loans PS LENCE Se aces en 
subconscious mind, ete. Theoreticians genertlh por eed in ae Hh here ld i ne 
may develop a theory begining mainly v ith sun WRAIS and wy a wel ale 
scientific evidence. In such situations: a thean, provides lala re ates exadance tis 
researchers will seek, Secu, (hearethe tly HV desk eoneneein acd sates ? 
beet collected. Inv such situations, a theary pricks U8 eegke pest EV cOURTUCTING 4 
either Of these {wor situations. rosenin Hers dete jeian ae lsatiol sig 
study, and then, apply the results of the stu baie pine ch we Ae deceeaeen KA 
adding ov moditying the theory, Viyeer, prawn this ANTE 


keterates some additional hypotheses- 


YS eee ay. Ue ase es MITRE 





j id 1, ih Fhe : mral Cl IC hy 
4 ’ 1 és. 4 i na The AE J if ? hi te d 


s. The ith source of a hypothesis a model which is fines! a a Reneralizedhypeg 
description that, by analogy, explain saa joie? Siar a fo set of g es 
behaviours (Heiman 1995), The basic difference etweeh cep “i : el is that a ah) 
accounts for broad, abstract components of ae eeu ws ‘on < or : PTOVides 5 “Oy 
concrete analogy for thinking about the various concepls a aie 1ey aoe Operate. Pe 
basic characteristic of a psychological model is that t invo ves ow C art or diagram We 
certain psychological process. For example, Jessen 2 model describes haa 
memory with help of a flow chart having three separate si that FSPresent the fic, a 
information through short-term and long-term memory (At ness & Shiffrin 1968) 7 
explanation of the model provides a useful analogy for derivin B specilic hypotheses aboy 
and when information remains in memory for the time being or more permanently. Subsequent” 
these hypotheses are tested and the obtained results are used to bring a modification nl : 
model, if needed. 

in this way we find that there are several sources from which the researcher 


hypotheses for his study, 


FUNCTIONS OF HYPOTHESES | 
For a behavioural scientist, a hypothesis does the following major functions, 

1. Hypotheses test various theories: In behavioural research the researcher develops g 
theory to account for some phenomenon and then, he devises a means whereby the theory cap, 
be tested. He seldom tests the theory directly. Most of the time he conducts tests of hypo 5 
that have been generated and derived from that theory. If the hypotheses test out as specified} 
the researcher, it is said that his theory is supported in part. Thus one of the major functions of 
hypotheses is to make possible to test theories. Thus in this light, an alternative definition ofa 
hypothesis can be the statement of theory in a testable form. All statements of theory in testable 
form can be called hypotheses. 


derives | 










2. Hypotheses suggest various theories: In behavioural research, it is often found th 
some hypotheses are not associated with any particular theory, It is just possible that as a resulto 
some hypothesis, a theory may eventually be constructed, Therefore, another function of 
hypotheses is to suggest theories that may account for some event. | 

Although it is a common practice that the researcher proceeds from theory to hypotheses, 
occasionally the reverse is true. The researcher may have some idea about why a given 
phenomenon occurs and he may hypothesize a nurnber of thin ps that relate to it. He may find that 
some hypotheses have greater potential than others for explaining the event or particular 
behaviour, and as a result, he may construct a logical system of propositions, assumptions and 


oe linking his explanation to the event. In other words, it can be said that he has devised a 
Ineory, 


_ 3. Hypotheses tend to describe social phenomena: A hypothesis also does a descriptive 

function, When a researcher tests a hypothesis empirically, it tells him something about the 

phenomenon It Is associated with, Hf the hypothesis is supported then his information about the 

tiie increases, When the hypotheses are refuled, the test tells us something about the 

ne srvotheat Si es tel hig know before. The accumulation of information as a result of 
*POMesis testing hat way reduces the amount oj lsnorance of the researchers. 


A art fron the 1 Orimary ya | 
)) bese primary functions, hypotheses also have some important secondary 


clions act. testi 
ae none: tpotnee lend to refute some ‘common sense’ notion about the 
things anid also eal oa he eh stone about the explanations we presently use to account for 
Besides these, as 4 seiniaiel ae towards the environment to one degree or another. 
communities, delinquents and | Mi corran hypotheses, social policy may be formulated in 
ents and offenders may be treated differently, teaching methods may be 





The Problem and the Hypothesis 465 


modified and improved, solutions to various kinds of problems may be suggested and penal 
institutions may be redesigned and revamped. Besides, hypothesis also tells the researcher what 
data to collect and what not to collect, thereby providing focus to the research study. 


—————— a er a 


i — 


as 


6. 
_ Write short notes: 


Review Questions 
What is a hypothesis? Point out the characteristics of a good research hypothesis. 
Explain the significance of hypothesis in research. What are the main considerations in 
formulation of a hypothesis? State them. 
What is the difference between a research problem and a research hypothesis? State the 
sources of research problem. 
Discuss the major steps in formulating a research problem. 
Discuss the main sources of formulation of a hypothesis. Also, point out the imponance 
of different types of hypotheses in behavioural researches. 
Citing relevant examples, discuss the ways in which a research problem is manifested. 


(a) Types of problems 
(b) Types of hypotheses 
(c) Important considerations in selecting a problem 


a 


19 


REVIEWING THE LITERATURE 


| CHAPTER PREVIEW < 


of the Review 
re Revie™ 


e Purpose 
Types of Literatu 


* 
e Sources of the Revie 
journals and Books 
Reviews 
Abstracts 
Indexes 
Intemet . ‘ 
Doctoral Dissertations 
Supervisors/Research Professors : 
e Types of Literature | 
e Writing Process of the Literature Review x 
e How Old should the Literature Be? % 


she 


e Preparation of Index Card for Reviewing and Abstracting 


Abstract 





PURPOSE OF THE REVIEW , 
A collective body of works done by earlier scientists is technically called the literature, Any 


scientific investigation starts with a review of the literature. In tact, working with the literatures 
an essential part of the resear +h process which generates the idea, helps in developing signi fica 
questions and is regarded as ‘astrumental in the process o! research design. The main objectives 
of a review of the literature are enumerated below, 

1. Identifying variables relevant for research : When the researcher makes a careful revien 
of the literature, he becomes aware of the important and un important V ariables in the concemet 





area of research. A careful review also helps the researcher In selecting the variables lyingy itnin 


the scope of his interest, in detining and operationalizing variables and in identifying van 


which are conceptually and practically important. Thus a review of the literature, on the whol 


prepares the researcher to formulate 2 researchable problem in which conceptually ant 
practically important variables are selected, 

2, Avoidance of repetition: A review of the literature helps the researcher in avoiding af 
duplication of work done earlier. A careful review always aims at interpreting prior studies 








foundation for present studies. In some cases, the duplication ot replication of prof 
becomes essential. This is especially true when the researcher wants to test the validity 


earlier studies. In such a situation, too, a careful review helps the researcher In getling acq la 


a the number and nature of the studies related to the study whose validity 's being @ 
al present. 


466 









Ai! 
indicating their usefulness for the study to be undertaken. Thus prior studies Serve ce 








Reviewing the Literature 467 


3, Synthesis of prior works: A caretul review of the fiterat | 
collect and synthesize prior studies related to the ees bat ne! ig the researcher to 
building 4 better perspective for future research. A synthesized Z Tl i mM Pali helps him in 
helps a researcher to identity the significant overlaps and ga ollection of prior studies alsa 
4. Determining meaning and relationshi Bae ANG Hie prior weiss, 
¥g the researcher in discoverin ‘onship among variables: A careful review of the literatu 
enables {he $3 ne. iu BRON CHING SRPOrAnt variables relevant to the area of th sea 
research. When signiticant variables are discovered, the relationship among th Ais 
identified. Subsequently, the identitied relationship is incorporated into ives a = be 
Thus, for conducting a scientific study, the relationship between the different eHablor ostie 
explored by reviewing the literature so that a good context may be b = must be 
cubsequent investigations. Y Mi SUB: TOF 
in addition to these specific purposes, there are some general purposes of the literature review 
(i) To argue tor the relevance and the significance of the research question | 
li) To provide the context for one’s own methodological approach 
ji) To establish one’s own credibility as a knowledgeable and capable researcher 
(iy) To argue for the relevance anc appropriateness of one’s own approach 


TYPES OF LITERATURE REVIEW 


as we know, the basic aim of review of literature is to integrate and summarize what is known in 
the area OT interest. There are many types of literature review. Some of the common and popular 
types are given below. 

f = 


la) Historical review: This is a type of research review In which the researcher traces an 
sue over time. Such review is sometimes merged with a theoretical or methodological 
-eview to demonstrate how concept, theory or research method had developed over 
time. 


ib) 


Context review: Here the researcher tries to link a specific study to the larger body of 
knowledge. Such review offen appears at the beginning of a research report and 
introduces the study within a broader framework and thus it continues to build on 
developing a line of thought. 


— 


Integrative review: |tisa very common type of review where the researcher presents and 
summarizes the existing state of knowledge regarding a topic or theme, and highlights 
agreements and disagreements within it. Sometimes it is combined with context review 
or may be published as an independent article as.a probable service to new researchers. 
Methadological review: It is a specialized type ot integrative review where the 
researcher tends to compare and evaluate relative methodological strength of various 
studies and show different methodologies such as research design, samples, etc., for 
different type of results. Meta-analysis is often used in methodological review. 

\ Self-study review: A type of review where the researcher displays his or her familiarity 
with the topic of interest. It ts often part of an educational programme. 

Theoretical review: It is a type of specialized review where the researcher presents 
several concepts, models, theories, which concentrate on the same topics and compares 
all of them on the basis of assumptions, scope of explanation, logical consistency, etc. 
‘terature, integrative review, context review and 


(c 


id 


© 


=r 


(fF 


Of these various types of review of | 
theoretical reviews are comparatively popular. 


SOURCES OF THE REVIEW 


There are different sources of the revi 


ew of literature. Some of them are enumerated helow. 





a 


r \ —_ ral Se ionces 
yy Methods 1! peharvion 
Is and Resear , 
erie 








Measure” 


468 Tests, 
Books _ cafevant to the areas of interest are the primary co 

journals And FO nals and books ©" Je a periodical section where differen 4 a 
pifferen! rosea jos! major ‘well : arch journal eenerally contains the - i PSS of 
the literature review te available Se etait methodology and results. Such iat o, 
researc an eam none ournals. A referred journal js one 
refully revien ww for publication. Simi i : 

reports only eral manus¢ ripts and selects : " , ib k a | nat books ap. 
the reviewer rejects sabi erature review and they include tex aii 7 vandbooks, Veathay 
‘rect sources O! the literate aleate regarded as more useful because they proyj 
sin rasan information relevant to the area of interest, 
latest an i | 


ferred | 
} by the experts before publicar.. Uh 
referred an which are ca Y P P blication, ae 


also d 
dictionaries. etc. 


researcher with the 


Reviews a 
Reviews are short articles that give 


over a period of time: Reviews are 


regarding the work done in a Particular ag 


brief information ain = 
nly published in journals, yearbooks, handhectet 
commonly }) books ang 


é ‘ewers select research articles of their interest, organize them conten: 
encyclopaeaias. pagsces 4 offer their own suggestions and conclusions. Review articles a, 
criticise sa rls eon ators who wish to have all the relevant researches at Onan | 
gacd paces cee 3 them. Since the reviewers organize all the possible rese; 
a ie en area In their review articles, review articles also provide the advantage g 


prior reviews. 














Abstracts —— | 
Abstracts provide a summary of the research reports done in different fields. Psycholopie 


Abstracts (Washington: American Psychological Association) first published in 1927, 
Sociological Abstract (New York: Sociological Abstracts, INC) are the two common exan 
abstracts. Other sources of abstracts are Educational Resources Information Centre (E 
Dissertation Abstracts International, children resea rch abstract and the Indian Educ 
Abstracts (published by NCERT, New Delhi). These abstracts are useful sources of up-to 
information for researchers. In an abstract, besides a summary, researchers get all the relevant 
nformation such as the title of the research report, name of the author and the journal pagination 
information, etc., regarding the research article. The only limitation of abstracts is that they faillo 
satisty those researchers who desire detailed intormation regarding the methodology and a 5 
of the research articles, | 





Indexes 


Ind 5 [ j . X : a “fl 
pone sneiieeis of the research report without any abstract. The titles are categonzet and 
ce a : = - ally in each category so that the researcher can locate any article of in eres 
nana ie sei Index (New York: H W Wilson Co.) is a good example of an index. 
be best wus Information, they keep many a researcher dissatisfied. ey can 

aed’ ds a supplementary source which, if cambi ee 
valuable information to the researchers , if combined with other sources, C&" PS 
Internet 


Toda | th f crriesl | 
for oo dee net Isa very easy and quick source of 
GIN easy access aed is 
updated information ta writings by important researchers. They also prov” 
provide useful bibliographies . Mat ordinarily is not available in the library. Internet sites a 
elated to a particular researcher, Search on the internet also fe” 


very ia 


literature review. Internet sites are 


Bt 
. 
ig 
| i 





Rerweaving the [teratire 469 


ne paaeaie professional SIRS 00 academic associations which can provide ; 
support to the studies bi the concermed area, Such organisations alse oti . “ lox oi 
important papers OF periodicals which can he of immense help to the ress ll publish 
sublishers put the brief content and extracts from recently published books en the aie — 
ese can be of valuable help to the researchers. Sometimes, the internet sites ini be eh 
encyclopedias which can also be very useful and informative as ‘io corey 
y are not normally suitable for citing in a thesis,  Packpround 


ta) 


th 
extracted from 


reading. However, the 


Doctoral Dissertations 

dissertations have also been a very good source of the review of the literature. in 
libraries of universities, doctoral dissertations are available. The researcher can choose the 
dissertations of their interest and find useful and relevant information there. There are no set 
forms of writing the research report in a doctoral dissertation but most dissertations contain 
chapters like an introduction, review of the literature, purpose of the study, method of the study, 
results, discussion, summary and conclusion. Some researchers prefer not to add a separate 
chapter on the review of the literature and hence, it is incorporated into the introduction itself. 
Thus the doctoral dissertations present the advantage of prior review. Ordinarily, it is not possible 
for the resea cher to move through all the important libraries in the country to consult all existing 
doctoral dissertations. Hence, he can have access to those dissertations that interest him through 
Dissertation Abstracts International, which publishes the abstracts of the doctoral dissertations 
submitted to different universities. In India, the Survey of Research in Education (edited by 
MMB Buch) does much the same function. The second Survey of Research in Education covering 
+_78 has been released. Recently, the listing of dissertation abstracts has 


the period between 1972 
been computerized through DATRIX in terms of the key words (usually words appearing in the 


title of the dissertation). 


Doctoral 


Supervisors/Research Professors 
‘sors often know the literature well and are able to guide researches in the right direction. 
They are the recognised authority on the topic or research problems. Theretore, they should be 
consulted and their suggestions and advice should be carefully analyzed. It may also be that 
other research professors have recently sourced and reviewed the literature or an area very close 
to the literature that the researcher is seeking. So they also constitute one important source. 
Whatever may be the sources of reviews, the process of reviewing literature itself is not 
above criticisms. Inevitably the interpretation of findings, insights derived, the manner in which 
-onclusions are drawn are all solely dependent upon the judgements of the reviewer. In other 
words. such reviews fall prey to what is called subjective judgement. 


Superv 


TYPES OF LITERATURE 
In order to work with appropriate literature, it 1s essential that the researcher must be able to 
identify and find it. For this, he must have an understanding of various literature types. Some of 


the common types of literature are given below. 
Subject-specitic books 
and research reports can provide important background 


Introductory and advanced textbooks 
information about theory and method ot 


and context for the research. Such literature also provide 
the research. 


Grev literature 


By grey literature is meant both published and unpublished n 
International Standard Book Number (ISBN) or an Internation 


jaterials that somehow don’t have 
al Standard Serial Number (ISSN). 











i ai 4. Mi rite i, 
f iF. i aanecd A cure! fr / i et td f 4, F 
" ene A) 


) : £ Ve at ate | 


Tile, 
Metall, 
] ‘a ca afistics 

ti iC — archives and stat 

Hieial publications, « a _ - oe 
ae { literature serves a dual purpose. Firstly suc h literature can be a valuable 
, | | | f : : ai . } j J : | - i] / i 4 7 b | 
bes J and contextual information and secondly, they can also be used as 
i apc LEM ; ’ ee | cae e 
coms ras Document and secondary dala analysis are often based upon 
secondary Gala, be 
literature. 


dl *OURCE gp | 
Ihis 
NYP E og 
x | 
| 
Writing aids 
As its name implies such literature generally offer a signiticant suppor during the 
: : | we j i i i Fy | . 
writing and can be easily used to improve the linguist stvle of the work. Such liter 
t = f 
dictionaries, bibliographic works, encvclopedias, thesauruses, yearbooks, be 
= abt ot p Jl : 
almanacs, etc. 


Pronese | of 
alu re inc| 


aks Ef Uotes 


Journal articles 


This type of literature is very common among the researchers, Its popularity is due to seveig} 
factors. First, journal articles are very credible. Second, they are often targeted tor academic 
audience. Third, they possess the trait of speciality. Fourth, they possess the regularity of 
production which means that research articles are not only relevant but also current, 7 


Of these various types of literature, the relevant one is searched by the researcher. 


Although reviewing literature is time-consuming and daunting, it is also a reward otherwise 
because it serves several functions. OF these functions, the following ones are Important: | 


(i) It provides a strong theoretical background to the study. 
(ii) It helps the researcher in contextualising the findings of the study. 


(iii) Reviewing literature helps the researcher to show how his tindings have Contributed tg 
the existing body of knowledge. ‘ 


liv) Reviewing literature also helps the researcher in establishing a link between what hels. 
proposing and what has already been studied. In simple words, it helps the researcher in, 
refining the research methodology. 


(v) Reviewing literature also helps the researcher in c larifying and focusing on the research 
problem. E 


WRITING PROCESS OF THE LITERATURE REVIEW " 
ee the literature review may be a very long chapter, it does need some form of structure. The 
simplest ways of organizing the research works is to discuss them) in ¢ hronological order. But this 
Way mi 2 NE Rs Se = er. ' 
siblects we Ses — appropriate in all situations. Another way may be that works on different 

DPeCts : : riths j 4 = > fe if ip 
oS a ee grouped together with the date of publication as only criterion of order. But this 
| P 5 a 7m tag 4 : ! 2 ‘a : y @ 
oublicsion Saale ~ F eeanien way may be to base the structure on the different types of 

| - For example, chapters from books. journ- a * te 
. ' Le S, Journal articles and books by single authors 
should be separately p : = ANG DOOKS by single aula 
Brouped and structure eater: AEA ne Fen eae a 
literature for informing, establishing and = aes basic aim of literature review is to use thé 

Te ag 6 aNd dreuing. In tact, literature revi ; eV ne 

: : . eview sho 3 ay a 
nian said report. O'Heary (2004) has recommende Soult ge Deyo 

a - m my 

1e following steps should strictly be followed: 
li) To read a fe . 

‘ ew good, relevant reviews Recaarc . 
reviews done in several theses and jo eat Researcher should take a look on the literature” 
reviews should be sorted and this FA se articles, From these reviews, good and relevant 

es = ie Nis depends u | 
supervisor should help him in ¢ : ila 


electing the r © research skills of the researchers. The 
cting the relevant and good reviews. 








d that for writing a good literature review, 





,: 7 iar h theses ne | 
ides unpublishe cl researe >» NEWsPape 
“ory that inv hu ra 
s abroad cates 





Kerening the Literature 471 


GD) ied es rete u annotations while going through various reviews The researcher 
should begin to sort and organize the annotations of the reviews by themes, issues of concern and 


.  _4& ' 5 fy ) es i Fi 
common limitations, etc MN hile he will be doing so, some patterns would start emerging and this 
would, in turn, help him in developing his own argument 


(iii) To develop a structure The researcher should 
fo his most urgent needs such 


structure the potential reviews according 
structure $0 developed is alw 


45 topical themes, arguments he wishes to establish, etc. The 
ays subject to modification with the emergence of new thinking. 

(iv) To write purposefully The researcher should note that he can review the literature 
without any agenda but he cannot write a formal literature review without any definite agenda or 
aim. The readers must know the reasons why and what the researcher is telling them. 


(v) To use the literature fo support the argument The researcher should not use the review 


only for reporting or borrowing the arguments from others, rather he should use the literature for 
venerating ideas that may help or support his own arguments, 


(vi) To make the literature review an ongoing process The researcher should make the 
literature review an ongoing process. In other words, his literature review should inform the 
researcher's questions, theories and methods and these should help in setting the parameters ot 
the literature review, Thus literature review becomes a cyclical process and should often have a 
moving target. 


ivii) To get plenty of feedback The researcher should not wait up to the last minute of 
writing process. Whetever has been written should be passed over to supervisors and other 
experts for their teedback, Early teedback gives a chance for rethinking and modification of ideas 
which can be incorporated in the writing process, 

(viii) To remain prepared for redrafting  \n view of the suggestions through feedback, the 


researcher should redraft the review in a coherent manner so that his argument is reasonably 
supported. 


Thus writing the literature review is a complex task which can be made easy by following 
certain steps. 


HOW OLD SHOULD THE LITERATURE BE? 


One of the most important questions before the researcher is: how old may the literature be? The 
simplest answer to this question is that it can be of any age. In fact, academic research is a 
cumulative activity. Each generation of researcher learns from the work of previous generation 
and current research basically depends upon the work and insights of the previous researchers. 
Since in any society the latest and contemporary research and publications are subject to 
vigorous demand and due to this reason, it is preferable to cite recent publications as far as 
possible. 

Despite this, almost in any discipline, there are some seminal works which are centuries old 
but have become so significant that they are still being preferred by the researchers. Although 
their original ideas have been modified by subsequent researchers over time, their original spirit 
and views still remain significant and are held in considerable esteem. For example, the work of 
Sigmund Freud in the field of psychoanalysis was done about 150 year back but his ideas, 
theories, viewpoints are so pertinent and of importance that any researcher of today, working in 
this field, is bound to originate the review of the literature Irom his valuable contributions. 

However, it would be a healthy suggestion for researchers that they should always take 
precaution in citing older works unless they are confident and convinced in quoting them. 


PREPARATION OF INDEX CARD FOR REVIEWING AND ABSTRACTING 
Aiter going through the different sources of the review of the literature. researchers prepare their 


: Pa 2 hfe | | ae a is 
own review and abstract on the index card. Usually for this purpose, a 6" x 10" index card is 





in Bebanvioural Sciences ) 
sand Research Methous in Bebartot 
sche Maasnrermenss ¢ 
4a7z2 Tesh. : 


; ‘tract in about 150 Worde |. 
vd. In most journal articles, an abstract | , ee res isi Vide 
In mos | ore the article seems tp oh 
recommncrice incorporate it in his own abstract. Where gabe : NS to be Very in Thy 
her can Ince : re detatle “rsion, Usya! | Mh: 
agin’ + researcher can prepare a more ie i aie ih abstr” iH 
ant, the researc ; —_ rte consists of the pur SE ¢ Uh, 
me = is divided into three parts, The first part const : sm p ie and YPothesi oe 
_ Is div . » study in no | 
prepares sarcher should write down the purpose oF ine may More than of | 
study. The researcher s I. they can be recorded verbatim but if they are lengthy, they sh ut 1 
an) IT Pa! e" * a 7 ’ . 
the hypotheses are sma sists of the methodology ot the study in whi 
nthesized. The second part consists of the me 
SV aS | CCU. . 


Ch the Sina | 
3 C = J re ' f x ie Q 
sample, nature of the population, methods for measuring or manipulating the Variables ae : 
cample, nature of the om ane Aas | 
! Pn collection, designs and statistics are shown in synthesized form. The third "Od, 
ofda nN, Ges 


) ve he researcher should briefly write i ONsig. Oj | 
the findings and conclusion, In this part, the ex : iad n the ing | 
lating to each hypothesis and also the conclusion rawn 0) Oncisely, At the, aa 
a index card, a full reference should be clearly written in se - sattie Way in w 4 | 
appears in the researcher's own reference list. There are si ty Pes of research formats Which 
will be discussed in Chapter 24) but that followed by the Pu a Nilo Of the Arma. 
Psychological Association is widely popular and has ‘been adopted by most of the im Man 
research journals. The researcher should never trust his memory for recall of the details ce 
research article and therefore, all the important and relevant details should be C 
down in the index card. A sample of the rev 
index card has been given below, 


“FiCan 


iew abstract adjusted to the length 


of the 6" x 1¢" 











ABSTRACT 


Singh, A K (1980). Extraexperimental Associations in Proactive 


Interference, Journal of 
Psychological Researches, 24 (3): 133-39. 


Purpose And Hypotheses 


This study explores the possibility of extraexperimental associations as One source of 
| proactive interference (Pl). There were four hypotheses: (1) US (trigrams having uncommon 
sequences) produces high amount of PI as compared to CS (trigrams having common 
sequences) and NS (trigrams having neutral sequences); (2) NS produces higher amount of P| 
than CS; (3) CW (words for m 


iddle frequency) produces higher amount of PI than NW (words 
of middle frequency) and UW (words of low: 


frequency); (4) NW produces higher amount of 
Pi than UW. 


| Methodology 


In this study 30 subjects (all college students) were required to memorize six experimental 
lists to the criteria of one perfect recall by the method of serial anticipation. Subsequently, 30 
subjects were divided into three groups of 10 each, A recall test was taken immediately after 2 
minutes for Gr. |, after 2 hours for Gr. II and after 2 days for Gr, lil. The amount of Pl occurring 
Over a 2-minute interval (Gr. |) served the main baseline for comparison. The significance of 
the mean differences of recall by the three groups was tested through the t test. 



















Findings And Conclusion 
US produced the hi 
| the highest amoun 
| case of NW was | 
and nonsense) 


ghest amount of Pl and was followed by NS. Among words, CW produced 
t of Pl whereas UW produced the least amount of Pl. The amount of Pl in 
ntermediate. |t was concluded that linguistic associations (both meaningful 
in subjects before entering the laboratory tend to increase the amount of PI. 








arefully ae 


Thus the reviewing ariel abstracting 
yretully and systematic ally. SOMETIMES jy h. 
wel ry for recalling a Particular depai), But tt 
ancy details or their memory may 
et lo accommodate every irr WOMAN ancl rele 
suggested above, 


ae 


Lo 


‘I = aie 


Discuss the major sources of review thar are 


Discuss the various types of literature. He 


Point out the steps involved in writing 





—" 


Reviewing thy Hleratire 474, 
OF the literature On the inde 


iS heen Teported tp; 


WSIS NEL a healthy pr 
he blurred alter sry 


* Card should be done 
at researchers rust 
aClice because they are 
Me lime. Researchers st 
ant detail under the 


heir 
apt lo 
ould iry their 


three common headings 
‘ie eee 
M Review Questions 


Is Major Purposes 
commonly unden: 


What is meant by the ‘review of literature? Disc Ke 


the field. 


ken by the researcher in 


® Old should 


the literature be? 


a good literature review. 
Discuss the different types of literature review, 


: 





20 
VARIABLES 


oe wa |)|!)~COOtt™té(‘(s:stsstssS 


CHAPTER PREVIEW 
— i eee eT Variables and Intervening Variables 

eatnatitl Variables and Quantitative Variables 
Continuous Variables and Discrete Variables 
Moderator Variables and Intervening Variables 
Active Variables and Attribute Variables 

« Difference between a Variable and a Concept 

« Methods of Measuring Dependent Variables 


« Important Gonsiderations in Selection of Variables 
e Imporant Approaches to Manipulating Independent Variables 
. 


Techniques of Controlling Extraneous Variables 
Technique of Elimination 
Constancy of Conditions 
Balancing 
Counterbalancing 
Randomization 


Controlling Demand Characteristics 





MEANING AND TYPES OF VARIABLES 


A variable, as the name implies, is something which varies. This is the simplest and the broadest 


way of defining a variable. However, a behavioural scientist attempts to define a variable more 


precisely and specifically. From his point of view, variables may be defined as those attributes of 
objects, events, things and beings, which can be measured. In other words, variables are the 


characteristics or conditions that are manipulated, controlled or observed by the experimenter, 


Intelligence, anxiety, aptitude, income, education, authoritarianism, achievement, etc., are 
examples of variables commonly employed in psyck 


Variables have been classified in three differ 
(i) From the viewpoint of causation 


(ii) From the viewpoint of the de 
(iii) From the vie 


ology, sociology and education. 
ent ways. 


sign of study 
wpoint of unit of measurement 
A discussion follows. 


Dependent Variables, Inde 


From the viewpoint of caus 


pendent Variables and Intervening Variables 
lour se 


: ae i a : nthe sige nvestigate a causal rel 
© Change variables, which are responsible lor 
® Outcome variable, which are the outcome 
© Unmeasurec variables, that a lect the | 


ationship or association, 


bringing change in the phenomena. 
or effect of the change variable. 


link between cause-a nd-effect variable. 
44 


Vanubles 475 


ting Variables, which in certain situations are considered essential tor completing 
7 CaS ponshie between cause-and-effect variables, 
— inology ot research methodology, the change variables are called as independent 
2 Lapeer! variables are called dependent variables, the unmeasured variables that 
variates ns 


he link between cause-and-effect relationship are called as extraneous variable and the 
sact the ; 


ag that link a cause-and-effect relationship are called intervening variable or sometimes 
variables ee variables (Grinnell 1993). 
cot ‘dependent variable’ and ‘independent variable’ have been borrowed trom the 
eee es ‘1 behavioural researches. The classification of variables into dependent 
field of ee frequently employed in experimental research. The dependent variable (DV) 
and independen Boat which the experimenter makes a prediction. The independent variable 
js defined Zee which is manipulated, measured and selected by the experimenter for the 
(IV) Is Lam 4 cing observable changes in the behavioural measure (or DV). In other words, 
purpose orn 5 arti ‘5 the variable on the basis of which the prediction about the DV Is 
— es eee ‘onal synonym of IV is controlled variable, which is rarely used because of its 
made. be ari with control variable. Underwood (1966, 12) calls the IV as the stimulus 
cone on it DV as the response variables. An example may illustrate the distinction 
variables an i pee the DV. Suppose the experimenter wants to study the effect of teaching 
ccna os . the classroom achievement of pupils. For this purpose, he may employ three 
et hin say, A, B and C and may teach the samme group of pupils by these mice 
pes a mee yee Sante the achievement may be measured or predicted. Teaching methods 
metno ds ane sein le of the \V and the classroom achievement constitutes the example of the 
constitute the Sch ae yerimenter wants to study the effect of a religious group upon attitude 
OM sinter | ae he may take the Hindu, the Muslim, the Sikh and the Parsi as we os 
nceienm ie study their attitudes towards family planning. Subsequently, he may ain ly 
relgics ere oligious group has a favourable attitude or unfavourable attitude towards peat 
A ne ir this oni the religious groups constitute the example of the IV and the attitude 
epee ly planning constitutes the example of the DY. 


Sometimes in the experiment, a variable is left uncontrolled and be caer elk 
to vary with independent variable. Such a variable is called a confoun ing sae ‘ taboratory 
opel » the researcher is to aS5€55 the effectiveness of the lecture met od vs. eae! 
peer method in acquiring knowledge of jundamentals of ease pisces with 
equivalent groups of students. One group 1s lectured by acti, both groups receive the same 
laboratory demonstration is taught Dy Professor 8. Supsequen’ | analysis reveals that students 
final examination and their performances are compared. oe i ee ae eee 
taught by Professor B excel in their perforniin™ = es ac aS d monstration is better than the 
A. Can the researcher conclude that method of orapesent i iceteaciae is not the only way 
method of lecture? The answer is emphatically no. The big seh i. nfessor) was allowed to 
the groups differ in terms of how they have been treated. i : aie sete B is a better instructor, 
vary along with the method of instruction. It might wi a iertanaablé terms. On the other 
motivates students properly and can explain concepts rs pt ‘difference between groups If 
hand, Professor A might be less experienced, That te . Te aiferaat teatinenté 16 
performance might be due to the different instructors ral es ale of confounding variable am 
difficult to be sure which interpretation !5 correct. This is an mae in ful moncluisiors about th 
when a study is confounded, it 's impossible to. arrive at Meanine 
results, . _ ‘ dependent variable is th. 

One important thing related to the: selection ane easy eee a null result occu 
of floor and ceiling effect. A Moor the ia on of the scale. Such an effect ee 
because majority of subjects score at ae ao eb experiment t 
icra: esau task : too difficult for the subjects. For example, if in any exp 


Methods in Bebartoural Sciences 
ast ts and Research J 
476 Tests, Measurenen 


le’ itivity to lights of different colours 4 
i are people's sensitivity t col 
sl ote seston is i meaningful data will be obtained. This illustr 
facts are the opposite of floor effects and are said to have occurred 


Ut ally e giy 

dtes floor eff en Coy ‘ 
| WHEN subjent Cais 

close to the top of the scale simply because the task was too easy, C 


—_ S 
ights are clearly visible because they are beeeuing With op, Te tye 
example, if all the coloured lights are i affe 1 occur These NBM the task © Say 
ley Gey for TheeUb ects ord ee eee ee cblecnatines tec oe WPS at a would 
obtained with any task whose dependent variabes cannot track the full range inden 3 by. 
variable. These effects must be prevented and this can be readily done | be 
can check the appropriateness of the subject pool and variables 


throuph pi Pee 


. let Bie 

before ; ; Shy <oharae 

_ “Arying ry thay 
experiment proper. Ut 





In addition, a dependent variable must be SOrisiive, valid and reliable, To be m.. 
must change clearly when the independent variable ¢ hanges. To be valid, it Must mea. sitive, 
is intended to measure. To be reliable, a dependent variable must act « ONSistently jn Bis hatiq 
the independent variable. If subjects respond a cerlain way on 4NE GCasion by ination tg 
different way on another occasion when the independent variables remain the Ss lly 
dependent variable would be unreliable. eine 
As it has been said above, the IV is manipulated by the experimenter ang its ef 

examined upon the DV, Some experts, depending upon the mode of Manipulation, 
divide the lV into Type-E independent variable and Type-S independent varial ble (Dy | 
11). Type-E independent variable is one which is directly or experimentally manipulated 1979, 
experimenter, and Type-S independent variable is one which Is Manipulated through the BY the 
of selection only. Such variables are difficult to manipulate experimentally of dire Process 
example may illustrate the distinction between the Type-E and Type-S independent va He 
Suppose the experimenter wants to study the effect of temperature upon the rare : 
an industry. Here the IV is the temperature and the DV is the rate Ol production. 
manipulate the temperature by dividing it into three calegories—high, medium and low 
examine its effect upon the rate of production. Here the temperature js being direct! 
manipulated by the experimenter and hence, it constitutes the example of Type-£ inde a 
variable. Suppose, for the time being, that the experimenter is interested jn answering the 
question: Is the rate of production dependent upon the age of the workers? Age is here the 
independent variable. For investigating this problem, the experimenter will have to select Broups, 
of workers on the basis of their ages in a way by which he can get an appropriate representation. 
from different age groups ranging trom, say, 16 to 55 years. Subsequently, he will Compare the 
rate of production obtained by each age group and finally conclude whether or not age is a factor 
in enhancement of the production. The age is being manipulated by the experimenter by means 
of selection in order to determine its effects upon the rate of production. Hence, this constitutes 
and example of Type-S independent variable. A research or investigation which involves the 
manipulation of the Type-E independent variable is called experimentation, no matter whetherit 
is done in a laboratory or in a natural setting. Likewise, a research which involves the 
manipulation of the Type-S variable is called corre/ 
in which there are no independent v 


the researcher wants to study the factors underlying the mating behaviour of dos, he looks atthis 


behaviour from a distance for some time in order to have an idea regarding the factors underlying 
the mating behaviour. In this example the researcher 


variable, rather he is simply observ 
illustrates what we call observatio 


He 


or the investigator is not manipulating any 
ing the mating behaviour to know the underlying factors. This 
n. 

The independent variable (or the stimulus variable as Underwood calls it) may also be 
classified on the basis of the nature of the variables. Depending upon the nature of variables, we 
independent variables may be classified into three Categories: task variables, environmenta 
variables and subject variables. 





ation research. Sometimes, a research is done 
ariables. Such research is termed as observation, e.g., when 


Task Variables 


Variables 
The task variables refer to those Characteristice Which 
presented to the subject. tt s Which ar 


: >s of the task ciGcedun a nt Physical characteristics of enn 
features c es. There are different 

Some are simple and some Complex. The simplici appi 
produce a change in the behavioural Measure. When co plexity of the 
iblind alleys} ina particular maze is IncTeased, the proce Fexample, the y 
be a diticull task for the subject. Likew ) 
added to a study of complex reaction time, the TEAClion time is 
Environmental Variables 

Environmental variables refer to those characteristics of the envi 
physical parts of the task as such, but tend to produce ch tad 
Noise, temperature, levels of illumination and time of the 
variables. Suppose, for example, that the investi 
influenced by the degree of vagueness in handwr 
vagueness, the reading speed may be influenced b 
at the time of reaching the materials, The INLeENsiy of the light 
the enviornmental variables because the Variations in the 
tend to produce changes in the reading speed, Here in 
physi al parts of the task handwriting) but tend 
(relating speed). 


Ment, w 
anges in the beha 
day are ex 
Bator wants to stu 
Hing. It is likely 
y the levels of illy 


hich are not the 


vioural Measures, 
amples of any 
dy how 


ironmental 
reading speed is 
that Apart trom the degree of 
mination and noise o¢cy rin 

a nd noise constitutes the example of 
IMensity of the lipht and noise also 
tensity of light and that of sound are not the 
to produce changes in the behavioural measures 
Subject Vartables 


Subject variables reter to those characteristics of the subjects which 
in the behavioural measures. Sex, ape, Weight, anxiety, 
of the subjects (animats or human), which may 
Subject variables can be divided 


are likely to produce changes 
intelligence, etc., are the characteristics 
be conveniently termed as subject variables, 


Into two types: the natural subject variable and the induced 
subject variable. The natural subject variables are those variables which the subjects carry within 


themselves before the stan of the experiment, Age, sex, intelligence, and anxiety are examples of 
natural subject variables. Induced subject variables, also known as instructional vaniables, are 
those subject variables, which are induced by the experimenter’s instruction. Suppose the 
investigator wants to study the effect of two methods (A and B) upon the proplem-saiving 
behaviour. He may instruct one group of subjects with one set ot Instructions and ra Ae 
of subjects with another set of instructions. ‘The Iwo different sets - erase = eed 
produce changes in problem-solving behaviour. Likewise, the arabe che oan id 
learning as a tunction of epo-involvement. He may, lor eee give nol aidecs Pas ee 
subjects that a given task is the test at Ais ability aeenere ee as i These two types of 
being used as standard practice followed in sepeiaeusey slate 

instructions are likely to produce differences in pertormance « 


: ror the sake of contralling is 

: ales a vatiable only tor | bivdaages! 
ac tea ea dani Such a variable is not an independent variable, 

2 ae ; - res Bick ‘ i 

unwanted effect upon the ia a aaa RE ted in studying whether or not the mee ee 

For example, Suppose the experimenter ts ase r fa ce decided 10 conduct the oe 
} : at i A to Lone. - - ies 3 for the ime 

to |i 5.5 an the reaction time t _ Now suppose, 

ve 9 ale = 200 random presentations of each wae : dedi presentation. In 

experiment, he may jane ‘ - presentation of the simmulus rather =e Ne = ‘presents light and 

being, he makes a systematic tone and in the remaining ee i : shorter than the reaction 
Pa © 4 —s Sic 5 ; 2 Fe 4 is 5 ; 

ee he conli that the reaction lime 1 ic to the difference in stimulus 

subse tly, comes to the co : ined resull is Wot due 

ee One may here argue that the ontained T 

time to tone. : Vr pe 


ie 


; | wes PPE Cae eR RACE TE ee eee ee al 
ih ane feesevare fi Vii tf 
ape hee 
Measure 


Tests, ined ati 
ie wt have gained some Cumulative ¢ 


XPerjp 
IN the 

rety 

WOUld be ™ 

© COntro} 

1B tts iNfluen 


4 io} 
ub el | mile, : ary . ; 
eh; d in making quick reaction to light 


aking, 200 random presentations he 

entatians: This variable is manipulated 
i ese ‘ a a «=F ; , 
the dept 


ita Hence, this isnot an indepedent variable. Thus, fora variable 
sg nce ust De manipulated for the sole purpose ef Producin ; 
ee Te Rapwiee i will not be an independent variable nes CHlecy 
aaa os which are controlled by the experimenter because ‘i i “i 
those vatle ikely 16 produce changes in the behavioural Measures. pe 
lavant variables. Think of the ditlerence hetween Control yar 


or fe 
antral variables Or ee ae 
contr i ariabte. Control yaniables are also known as extraneous variables, For exanll 
controved Valle ath ble 


investigalor wants to study the genre catemabis toe ome Of so 
very likely that age will tend to affect th ; la i RES ! with adv 
height and weight of the « hildren will inc rease. Age he re is | ie example of the 
because it tends to influence the relationship: between height: and weight of te | 
Likewise, if the investipalor wants to study the effect of two tear hing methods—A and > 
classroom achievement, intelligence may be one factor which, if uncontrolled meee 
additional ditlerences in the classroam achievement, {tis expected that the intellip Bal 
will benefil more from either method of teaching and may improve their classroom ait dren 
Thus intelligence ms here an example af the relevant variable. J here-are some variahl a 
have no discernible effect upon the dependent variable, These are known:as irrelavse 
Relevant variables, of extraneous variables, are of three types, whieh must be cof oll 
any experiment ole 


ettect. The | 
which helpe 
then, bym 

























H i ne er 
ro the practice Cy ith, 


mations al tones 
Yhviously, 


( stimulus 


but 
rese 
tions. ¢ 
presenta | 
the ‘sequence © | 
unwanted effect Upo! 
dependent va 
independent varles’ 
the dependent vat 


Calle Bal 





situalrons : 
direct interest but are 


relevan 


1. Subject relevant variables: Subject’ relevant varial 
constitute the characteristics of the subject and 
does not want to study their effect upon the beh 
Type-S independent variables, which are th 
manipulation through selection, 
Thus age, intellipence, race 
subject relevant variables 
an independent yvarial 
experimental situation 
experimenter is caller 
variable because 
denenelepy variabl 


Hes are those variables : 
are controlled by the experimenter bee 
avioural measure (or the dependent variable), 
e characteristics of subjects and lend them ely 
Tey be ¢ onveriently Broupae das the subject relevant ‘I te 
*, aptitude, personality classification, etc., are the examp| 
: The students should note ¢ arelully that the same variable m 
le in one experimental situation and as a relevant variable in 
. Variable whieh is manipulated and whose offer te. sabe studied 
he a Hy Sabieag but when the Ox periMmenter controls 
(1 becames an orth we if ie Clee! upon the behavioural measure 
2. Siéiithona| eats “IMmple of the relevant variahle. @ 
environmental and task vivables akc he situational relevant variables refer lo tho q 
because they are like » Whose eHleels (on the DMV) are controlled by the expel enti 


ly to pr 

roduc ip : ae 
Mlumination, Complexity of i! : = Unwanted ¢ hanpes inthe OV. The lemperature, levels Or 
Ince penny Var tatyles whose offs tle ey cl fi Cxamples | Siltotve ical ri ‘heyant variables. All (howe 
CAPCHITEN Ler that ic ; : : eC 15 are MOWarnteed ariel wybyie hocan boo cliree ily manipulated phe 
varlables, ae INdependory Variables) | ayant 


are included in situational relew 
4 Seeque | 

hie fi fire 

“AP (See irronr the j L si The i | a ie ‘ | wh , 
Wet uer Merry ordinal a4 equence relevant variables are those variables WiNe 
concen Pepe example whee that the ¢ onditions of the experiment gecupy Mt 
Natig a | thie Lie Seale yee Stibajer fay re exposed ta (is ej mort ran than , 


Cy Type 
Adaptation PECIMENt (hat gue 
elites es Li el J N, j 1 eed 
subje likes ly Ify Wil Tiseerye en | : et Parties sequence), factors like para tee, fall : 
© Behavioural measure (or the DY), In the first condition: 


1g" prerte 
TTT Ie gs vir 
il| Net }; [ 
ie P a . 
se allected by praction but in the last condition, it may 


: 
A 


Variables 479 


similarly, fatigue may not he influencing the performance in the first condition but it may 
arid lunearye a ihe performane oP the subsequent Conditions. Prac tye re) ane) fealipruses are examples af 


geqHUeNnce relevant variables which are usually controlled by the counterbalancing design 
adopted hry the experimenter (D’ Amato 1970). 


From the point of view of the unit of measurement 
classifying variables, 


(i) Whether the unit of measurement is Categorical fas we find in case of norninal or ordinal 
scale) or Continuous in nature (as we find in case of interval or ratio scale) 


s there are two important ways of 





Gi) Whether it ts qualitative (as we find in nominal and ordinal seales) or quantitative in 
nature (as we find in interval and ratio scale) 


Variables thus classified on the basis of unit of measurernment are called qualitative and 
quantitative variables, and categorical and continuous variables. There is very litthe difference 
between qualitative and categorical, and continuous and quantitative variables. 


Qualitative Variables And Quantitative Variables 


The qualitative variables refer to those variables which consist of calegornes that cannot be 
ordered in magnitude. We cannot make such a staternent regarding the qualitative variables, e.g., 
‘eatepory X possesses higher (or lower) magnitude of the variable than category Y.’ Thus the 
qualitative variables comprise the categories which do not have a quantitative relationship 
among themselves. Sex, race and religion are examples of qualitative variables because they 
cannot be ordered in magnitude. Since the qualitative variables cannot be ordered in the 
magnitude, precise and accurate measurements are not possible, As a consequence, they are 
least preferred in any scientific investigation, The quantitative variables refer to those variables 
which are composed of categories that can be ordered in magnitude. We can, for example, say 
thal category A possesses greater magnitude of the variable than category B. Intelligence, age, 
levels of illumination, intensity of sound, etc. are examples of quantitative variables. We can say 
thal group A possesses a higher magnitude of intelligence than group B and older people get 
falipued sooner than younger and adult ones. Thus the variables can be ordered in terrns of 
magnitude. With the quantitative variables, precise and accurate measurements are possible 
because they can easily be ordered in terms of increasing or decreasing magnitude. In 
psychology and education, fortunately, most of the variables belong to the category ol 
quantitative variables, In sociology, qualitative variables are more common. 


Continuous Variables and Discrete Variables 


Muantitative variables are further divided into two catevories, namely, continuous variables and 
disercte variables. A continuous variable is one which is capable of being measured in any 
arbitrary depree of fineness or exactness. OF course, the measurement is subject to the limitations 
Ol iVvailable tools, Age, height, intelligence, reaction time, ete., are some of the exarnples of a 
Fontinuous variable, The age of a person can be measured in years, months and days. To po to 
even wa smaller unilbof measurement, ape can be expressed in terms of hours, minutes and seconds 
Ho. Likewise, reaction time can be measured in secands and milliseconcs. If required, it can 
also be measured in terms of microseconds, Thus, all such variables which can be measured in 
the serallest degree of fineness are examples of continuous variables, The discrete variables (alsa 
known as categorical variables) are those variables which are not capable of being measured in 
any arhitary degree of fineness or exactness because the variables contain a clear pap. Fer 
ecample, the number of roembers ina family constitutes an example of a diserete variable. The 
members eit a larnily may be any number like 5S, 6. 7, and san. Ne amaunt of refinement in the 


Yd og Flt | | 
TCS UIT ripe Ns unenl Can pareoctine ea value ol ‘a 4 - tL ir , oe or ty i members, AS a peneral 
! i a 


do oo a 


rule, diserete variables are these variables whose values can be determined by counting 


ee 
Wetbods In Behavioural Science 


aris a nad Research i 


curemé | 
480 Tests, Meds Kaci, A females i 
¢ children in a family, the number of fe M4 Particlar 


so on, are some of the examples a State, 
he quantitative variables belong to the cance 


independent vari “B0ry 

iables. It should, however, be noted ae me for biol sg May by 
of continue ie tive, continuous OF discrete. The same | ef  fheenere de Variables 
qualitative, quantitallv®; hological and educational researches, independent Variables 
However, in most of the psycnolog ag 


; ; iables 

Moderator Variables and Intervening oe independent variables (also called the SECon 
The moderator variables are ijamanen eae b th e experimenter because he suspects th at i ry 
independent variables) which are se eck Ad roman the BAiWany (OE CMI freee 

cha gee r moderate the relationsnip | Pendent 
variables may alter 0 — the moderate variables may be defined 4. 
variable and the dependent variable. Thus, sicige ecidlie tcas thos 

| mei ipulated or selected by the experimenter Decatis® Mi&y are suspected 1, 
variables which are manip! : dent variable with the dependent variable. For exa, 
moderate the relationship of the independent v mie any € the dectite " | 

nvestigator wants to study the relative effectiveness © cture method andthe 

suppose < ie gt upon classroom achievement. He may take two groups Of students of the 
ae “lee a to be taught by the lecture method and a - pe dH ihe demonstration 
method. But he suspects that the level of intelligence may be d factor tcnGan moderate the 
relationship between the two methods (the main independent varia les) and the Classroom 
achievement (the dependent variable). He may, therefore, manipulate intelligence by dividing 
both groups into three subgroups having high intelligence, average intelligence and low 
intelligence. Thus a total of six subgroups would be formed out of which the three Subgroups 
would be taught by the lecture method and the remaining three subgroups would be taught by 
the demonstration method. Subsequently, the classroom achievement of these six subgroups may 
be compared with each other. In the above example, the intelligence is manipulated by the 
experimenter because he suspects that this factor may alter or moderate the relationship between 
the independent variables and the dependent variable. The intervening variables are the 
variables which theoretically exist and tend to influence the behavioural measure. Such variables 
cannot be seen and/or manipulated by the experimenter and their effect can be inferred from the 
effects of the independent variables as well as the moderator variables upon the dependent 
variables. Often such variables are not named in the proposed research and hence, little attention 
is given to them. Suppose the investigator wants to test the following hypothesis: A person who is 
allowed to reach his goal displays less aggressive acts than the person who is not allowed to 
reach his goal. In this hypothesis the DV is the aggressive act, the IV is being allowed or not being 
allowed to reach the goal, and the intervening variable is the frustration, Thus the intervening 
variable is the variable which is influenced by the independent variable, the relevant variable 
and the moderator variable and in turn, affects the dependent variable. The inte 
can be easily identified by carefull 
it about the independent variable 


mber 0 
of bak in a library, and 


(D'Amato 1970 
hology and education, most of t 


district, the NUM 
variables. In psy© 


: rvening variable 
y examining the hypothesis and asking the question, “Whatis 
that will cause the predicted outcome?” (Tuckman 1978). 
Active Variables and Attribute Variables 


From the viewpoint of study design that may be experimental 
sets of variables: active variables and attribute variables, Ava 
sperientes is the active variable and the variable which is not manipulated but measured by 
the ee is the ee variable or organismic variable. Examples of active Variables are 
reward, punishment, methods of teaching, etc. S Xa , 
- some of the examples of attrib ji 

se ae anne ribute variables are 
Tend eee etc. These variables are human characteristics which have 

acy been Getermined. They cannot be directly alter ani haa 

ae PEEL ered or manipulated b 
Whe: sae y pulated by the experimenter. 

the investigator wants to know whether or not ten-year-old Zirls are nica to 


or nonexperimental, there are two 
riable which is manipulated by the 


Variables 451 


-old boys in numerical ability, it serves as an example of the study of one organismic 
ttribute variable (sex) upon the dependent variable (the numerical ability). 
etween the active variable and the attribute variable is general and is 
a fusing as well. There are some variables which can be categorized as attribute 
sometimes ani ane tive variables. For example, anxiety is one such variable. Anxiety can be 
iabien oe a ae set instruction to the subjects. In this case, it becomes an active variable. 
saat ira with the help of a scale or test. In this case, it constitutes the 
Anxiety cal ttribute variable. Thus the variable of anxiety can be studied either as an active 
ee ea attribute or organismic variable. Several other examples can be cited in this way. 
variable 


DIFFERENCE BETWEEN A VARIABLE AND A CONCERT se apr 
body contemplating to do research must know the difference between a) cor Saal 
abe A concept is a mental image or perception wei therefore, Cowie ee soe : 
vane. | variable, on the other hand, is measurable with Vv. 7 
Lace ek st we escent: whereas the variable can be measured by crude si 
afi cruit of measurement. Thus measurability is the main criterion of the difference aaa a 
a nd a variable. Since concept is a subjective impression, Its understanding afters ee 
concen’ * erson and which, if measured somehow, may produce problems in comparing the 
ee Er cekore. it is suggested that the concept should be converted into variable so that it 
ee bersabiiecté 4 te yneacurement, though with varying degree of precision from scale-to-scale. 
come example of variables are age, gender, attitude and intelligence, which can be measured 
whereas excellence, satisfaction, domestic violence are examples of concepts. 


ten-yea 
yariable or a — 
The above distinction b 


var 
manipulated by gi 


accu 


whe 
METHODS OF MEASURING DEPENDENT VARIABLES | 

in fact, the researcher selects a dependent variable in the study and attempts to get 
operational definition of it. The dependent variable he will select and the aang ei 
he will provide depends upon the hypothesis and behaviour he is eel wa ae i 
population and the perspective that he seeks. For measurement of dependent variables, 
following approaches are generally undertaken: | 

(i) Direct observation of the targeted behaviour. The dependent variable may involve 
direct observation and measurement of the interested behaviour. For example, in A study of 
eating behaviour, Schachter (1968) measured the amount of food eaten by the subjects as ; 
function of different eating cues present. Likewise, Cunningham et al. (1990) conducted one 
etudy in which they studied how mood influences our willingness to help others. They presented 
subjects with cheerful or sad statements, then observed whether subjects would help a 
contederate. 

(ii) Indirect measures of unseen internal processes: Sometimes the behaviour to be 
observed is more an indirect indication of unseen internal processes, The dependent variable 
measures an observable response that the researcher thinks is correlated with the unseen process. 
Therefore, changes in the response are made to make inferences about changes in the unseen 
processes. For example, several researches have been conducted in which the researchers take 
physiological measurements of heart rate, breathing, blood pressure and perspiration level to 
draw inferences about the subject's anxiety or stress level. 

rhe use of indirect measures is especially common in studies of young children because 
they cannot directly tell us about their experiences. Tronick (1989) conducted one study in which 
he measured the length of time a child stared at a stimulus or noted when he smiled at it, to infer a 
positive emotional state produced by a given stimulus. In memory research, the number of words 
correctly or incorrectly recalled or recognized under different conditions is used to inter 


cognitive processes of retention and forgetting (Craik & Lockhart 1972). 





ak » POE) EFA ore Fe ee ere 
curements ana Resear M, 
oie Mest 
482 Test. 


s reaction time—the time a subject take 


; . 
= ace oe , 
Avery common indirect measure 
ve 


. of reaction time to stimuli, that differs along a p 
: basts 


hysical ding, 
Cal OF "ei, 
stimulus. ON the ses differences in reaction time to infer under| ING cog. Rhy. 
. -4n the researcher U | mple, Sternberg (1969) me: Bitiy, 
dimension, ¢ - involved therein. For exampte, Easur 

; ocesses 
emotional pr 


; : ts 3 

ie: a ‘probe word’ that either was or was ey a part OF a previous list fo, nie | 
screen rthrcgh the information in their memory. Bly 
subjects searc < about a stimulus: In this approach to the measurement Of dong 

tii) pee asked to make judgements about a stimulus and then, itis Observed 
variable De e as a function of the conditions. Another such approach it hey, | 
iki judgements : ” i” in which the subjects must select from the Possible choices Prova ts 
forced-choice tec “er researchers often manipulate characteristics of a visual illusion <" 
te sig Sle tell whether or not they still perceive the illusion (Stuart & p. : 
require subjects . try to obtain more refined judgements. For example, in studyin 99]) 
Sometimes ok ecneedirves dhe subjects may be asked to judge the duration Of an ing 
oe an f activities have occurred. In case the judgements are | 
in seconds after certain types 0 vmboli ses. For examn| ie ly | 
verbalized, subjects may be asked to make some symbolic responses. example, Block (ign 
conducted one experiment in which subjects drew a line to indicate the perceived duration Os 
time interval. ) 


iv) Self-reports: Another approach to the measurement of dependent Variable js to Obtain 
self-reports from the subjects. In such reports subjects provide descriptions about their ¢ i 
feelings, ideas or thoughts. Measurement of dependent variable by self-reports is commonly 
found in rating scales. For example, Schwartz and Clore (1 983) manipulated the description that 
subjects heard about their physical surroundings and then required them to rate statements aboy 
their mood level and causes for it. Likewise, such rating scales can also be y 


sed to Measure 
subject's attitudes, their perceptions or their confidence in their memory for an event (Brandsford 
& Franks 1971). 


In this way, we find that there are several approaches 
variables, Depending upon the hypothesis and variables 
approaches are employed by the researchers. 


to the measurement of dependent 
included in the study, any One or more 


IMPORTANT CONSIDERATIONS IN SELECTION OF VARIABLES 


In any behavioural research, the investigator tries to take a decision regarding the total pool of the 
variables which are likely to influence the dependent variable, At this juncture, apart from the 
selection of the main (or independent) variable, he tries to take a decision regarding the variables 
to be controlled (that is, extraneous variables) as well as variables to be treated as moderator 


variables. Tuckman (1978) has recommended three important considerations which should be 
kept in view by the investigator in making the above decision. 

|. Theoretical considerations: \n choosing a particular variable as 
the researcher should try 


fy to know how it interacts with the independent variable for producing 
hk in the dependent variable. How frequent is this interaction? Hia-should. aol 
whether or not the variable is related to the theory with which he is working, 
fe Pi oe P ditione is The variables must be selected in view of the scope of design of 
be sichtheewiitede rn selected as the moderator variable and the extraneous variable must 
| e basic requyj ee a ae 
must be such requirements of the experimental design. The independent variable 


that its manipulation can be easily done within the framework of the 


a moderator variable, 


experimental design. 


3. Practical consid 
considerations in sel in | a oe om 
: 's in selecting variables, He should lim 
in the study because it is 


Not possible + it the number of Variables to be incorporated 
© 10 study too Many Variables at a time. In selecting variables 


rations: A researcher 


Variables 483 


as moderator variables, independent variables and extraneous variables, 
view the financial resources available for the purpose. The time consider 
ignored. He should also see the nature of the variables because some variables are easy to study 
and some are difficult. Above all, in selecting the variables he should also be aware of the control 
he is likely to have over experimental situations. When the variables are difficult to study, he is 
likely to lose control over the experimental situation. 


in selecting variables for any investigation the above considerations, if kept in view, can be 
useful and are likely to increase the validity of the research. 


he should also keep in 
ation should also not be 


|MPORTANT APPROACHES TO MANIPULATING INDEPENDENT VARIABLES 


Manipulation of independent variables in experimental researches is very important. There are 
several ways of manipulating independent variables. However, the important approaches or 
ways of such manipulation are given below. | | 

(i) By manipulating context: Often the researcher manipu lates the context in which the 
stimulus is presented in order for manipulating the independent variables. Researchers studying 
learning processes have generally examined the effect of different types and amounts ot rewards 
or punishments on the subjects’ ability to learn. Some other researchers have induced different 
types of moods through hypnosis and have then examined its effect upon subjects’ memory and 


thought processes. In all these examples, the independent variables are shown to be manipulated 
through manipulating the context. 


(ii) By presenting different stimuli: Researchers often manipulate the independent variables 
by creating various conditions of an independent variable by presenting subjects with different 
stimuli or changing the characteristics of a stimulus. For example, in studying aggression, Green 
(1978) exposed subjects to films in an experiment in which different amounts of een ns 
depicted in different conditons and its impact upon the aggressiveness subsequently : wi y 
the subjects was examined. Likewise, researches in the field of applied social psychology ave 
measured the effect of various furniture arrangements on social interactions (Campbell & Harren 
1978). | | ' 

(iii) By manipulating the likely information to be given to the subjects: Sometimes t . 
independent variable may consist of the instructions or information given to the par in wis 
conditon. These instructions are itself varied and their impacts are examined, for example, a : 
(1975), for studying memory, examined how different information conveyed in questions cn 
an event atfected the subject's recall of that event. Likewise, Phesterson, Kiesler & Goldberg 
(1971) conducted one study to examine whether the subjects would evaluate the ea 
sex-biased manner in which the subjects were presented a painting and were told that the 
supposed painter was a male or female. 


(iv) By manipulating social setting in which confederates are used. Some ee 
tried ta manipulate independent variables by manipulating the social petIg sien a 
taken from confederates who are people entrusted by a researcher to act as ot i“ . hs ~ 

ject may respond to this. For example, Asc 31) , | aaa alt) 
— ts a present and a subject, asked to lunge te pl 4 i aan re 
overestimated its length in order to show conformity to the social pressure being - 

roup of confederates. ) 

- By manipulating intervening variables: Intervening ee sr dept — 
influences a behaviour. In fact, it intervenes or comes . a ot ectUPATafluencesthe 
variables. The intervening is enhanced by the independent pee W ae ee cate : 
dependent variable. For example, if we frustrate the subjects by ag) es 7 : caer aoe 
desirable reward, it will lead to a state of anger which, in turn, leads to gre 


i 7 Mi 7 7 ces 
484 ests, M 


your, Here the independent variable like frustration is being manipula 
aggressive behaviour. ; a obtaining a desirable reward. In doing so, we presumably a by 
preventing ee rable of anger which ts likely to influence the subject's scores | 
the interven! 3 
dependent variable of aggressivEne**: distinction between two types of internal | 
Hore ik wouldiee profitable to'miake aIstachon SM este Chavactatietic tee a 

| haracteristic and a trait characteristic. A state characteristic is a tem 
states—a stale ae that is influenced by situational factors. For example, state ANXiety ig aca, 
ee he level of anxiety the person will experience depends upon the Situation he: 
papas 4 trait characteristic iS something stable Cer sali ant Casily influenced fy 
<ituational factors. For example, self-esteem is generally defined “ ‘ trail « haracteristic becauce 
it is not influenced by the situation in which the person isin. itis important to remember that staje 
characteristics can be manipulated and studied as intervening variables bec ause they can chan, 
in response to the conditions of independent variable. Trait charac teristics cannot bie Studied jn 
this way because they are not influenced by most manipulations due to its nature of permanence, 


(vi) By stressing or overloading psychological system: In some researc hes, the researche 
try to manipulate the independent variable by overloading or stressing a psychological system 
and then, inferring from the subject's responses how the system normally works or operates. For 
example, in one study subjects were socially or emotionally stressed by having them produce a 
behaviour that conflicts with an attitude they advocate and measuring how they resolve such 
conflict (Sherman & Gorkin 1980). Likewise, in one study of attention the researchers presented 
two ‘dichotic messages’ simultaneously, one to each ear. In this case, the independent variable 
consisted of varied instructions or message content and the dependent variable was the way in 
which each message was heard (Gray & Wedderburn 1960). 

(vii) By manipulating physiological processes: In some researches, the researchers 
manipulate the independent variable by manipulating the physiological process of the organism. 
For example, in many animal researches the researchers may employ different surgical 
techniques to create different conditions in which some parts of the brain are removed or cut 


(Tokunaga et al. 1986). The dependent variables in such studies were the impact of such removal 
or cut upon the animal's behaviour, motivation, memory, and so on. 


In this way, we find that independent variables in psychological researches are manipulated 


in different ways. Depending upon the nature of variables, the researchers choose any one or two 
methods of manipulation for their research work. 


TECHNIQUES OF CONTROLLING EXTRANEOUS VARIABLES 


As we know, the extraneous variables are those that operate in the experimental situation in 
addition to the independent variables and affect the dependent variables. |t is, therefore, essential 
that extraneous variables must be controlled. If the researcher fails to control the extraneous 


variables, it results in a confounded experiment. The following are the five important ways [0 
control the extraneous variables: | 


(1) Technique of elimination 
(2) Constancy of conditions 
(3) Balancing 

(4) Counterbalancing 

(5) Randomization 


These methods can be discussed as follows: 


Variables 485 
Technique of Elimination 


The simplest way to control the extraneous var} 
experimental s cmeentt so a linia in any experiment is an extraneous variable, the 
simple way to contro! It ts to make the experimental situation soundproof 

seed siecle ile “ simple to control many extraneous variables. For example 
controlling ia eeiiaaaiai Variables like age, sex, intelligence through the technique of elimination 
.s very difficult. 


able is to eliminate it completely from the 


Constancy of Conditions 


Where the extraneous variables can’t be controlled through the technique of elimination, the 
can be controlled by holding their values constant for all participants in all conditions This i 
known as constancy of conditions. By holding constant the instructions to be given to every 
participant, by holding the time of the day constant for al 


| as | participants, and by holding lighting 
conditions constant, extraneous variables related to instruction, time of the day and lighting 


conditions can be controlled. The apparatus for administering the experimental treatment and for 
recording the results should also be kept constant in all conditions for all participants. Sometimes 
the organismic variables like sex, age, intelligence also become important extraneous variables. 


To control these extraneous variables, the researcher chooses only those participants or subjects 
who are homogeneous with respect to either sex or age or intelligence. Using all participants of 
the same sex naturally controls the effect of sex, if any, upon dependent variables. 


Balancing 


When due to some reasons, it is not possible for the researcher to hold the various conditions 
constant, he may try to control the extraneous variables by balancing a technique used in the 
following two situations: 

(i) where the researcher remains unable to identify the extraneous variables 

\ii) where they are readily identified and the researcher takes special steps to control them 

Let us consider the first situation where the researcher has no idea about the likely 
extraneous variables that might be influencing the dependent variable. In such a situation the 
researcher gives equal treatment to all subjects, but the experimental group is treated in a 
different way from the control group. Consequently, wherever the extraneous variables operate, 
they influence both the experimental group and the control group in equal manner and their 
effect is, thus, balanced. The changes occurring in the dependent variable are clearly attributed 
to the changes done in the independent variable. Suppose a set of four extraneous variables is 
influencing the experimental group in addition to the independent variable. The effects of these 
extraneous variables can be balanced out by allowing them to operate also on the control group. 
As a consequene, the independent variable will be the only one that can differentially influence 
the two groups (ct. Figure 20.1). 

Another situation of balancing is that in which the experimenter knows well about the 
extraneous variables that are of influence. Suppose the researcher knows that sex is the 
extraneous variable in a particular experiment. To control the ao of sex by balancing, the 

ime ; ul - ebiects from each sex to each group. This will 

experimenter may assign an equal number of subjecls 1" if age is an extraneous variable 
balance out the effect of sex upon dependent variables. Likewise, | age | a 7 

in an experiment, he may assign an equal number of each age Classification ie See: 
periment, y assig : aa , 

aneous variable. The obvious assumption of 

Thus balancing helps in controlling the extra oh ex arincnaal geouspane init 
balancing is that the participants in the control group 42 Sis 
€quivalent. They are only treated differently. 


wk a CES 
i wethods m Bebat HOY val ACHE 
Br ix le 
is. eassiremens Pili 


ah ‘Tes | 
Extraneous variable 1 


Experimental Group Extraneous variable 2~ 


extraneous variable 3 : i 
Extraneous variable 4 vay 
Positive amount of able 


independent variable 


Extraneous variable 1 


sien Extraneous variable 2 — — 
Extraneous variable 3 ne - 
Extraneous variable 4 ‘ 


Zero level of 
independent + N 
variable © effeg 


ig. 20.1 Illustration of the use of the control group as a technique of balancing 
Fig. 20.1 Illustr: 


ancin 
ree 7 Ressislichanciae is used for controlli ng the os variables of aMount g 
ete mas fatigue. There are such experiments in which each subject is required to SEIVE Und 
mrad different experimental conditions and there is the probability that the Participanty 
performance might improve due to practice or might decrease due to fatigue. The technique 4 
counterbalancing is used to distribute these practice and fatigue effects together called Orde, 
effects, equally over all conditions. Therefore, whatever their effects, they tend to influence 
behaviour under each condition equally since each condition occurs equal number of times a 
each stage of practice. Thus the general principle of cou nterbalancing is that each condition Mus 
be presented to each subject an equal number of times and each condition must occur an 


= equal 
number of times at each practice session, Besides, each condition must precede and follow ali 
other conditions an equal number of times. 


When the experimenter tries to control the order effects by means of counterbalancing, he 
assumes that there is occurring a differential transfer or asymmetrical transfer which needs to be 
controlled. Asymmetrical transfer is the transfer in which transfer from condition A (when jt 
occurs first) to condition B is different from the transfer from condition B (when it OCCuTrs first) 
to condition A. Table 20.1 illustrates the symmetrical transfer and asymmetrical 
Table 20.1 we find that there are two conditions of stud 


transter, In 
who received condition A first 


y, Aand B. In a symmetrical transfer those 


obtained a score of 6 on dependent variables and when 
subjected to second experimental condition B, they got a score of 9. The diffe 


rence was three, 


Table 20.1 Symmetrical and Asymmetrical transfer between conditions A and B 







Symmetrical transfer 








Asymmetrical transfer 





Ditterence | 


Difference 
One half of subjects 


One half of subjects 


os > Ww 


The other half of the subjec r | 
a ubjects received a score of 4 j eer 
The difference was ag on Condition A 


ain 3. Thus difference for 


and a score of 7 in condition B. 
both the Broups was the same and hence, the 





Vaniahles 487 
iransfer Was symmetrical, that is, no d 


| ferential transfer yw 
transfer, the difference for one half of sy 


| a5 found. But in the asymmetrical 
| | bjects from Boing to Aand B was 3 but for the other half ot 
subjects rom going to Aand B was 5, In symmetrical condition it would he legitimate to combine 
the values of A (6 +7) and compare it with the values of B (9+ 4), but in ssyivevietical condition it 
will not be possible for the researcher to combine the Values of A and compare it with the values 
of B. Counterbalancing is done for controlling asymmetrical transier. If the design is perfectly 
ounterbala need, the transfer will be asymmetrical one. | 
The difference between balancing and counterbalancing 

two are different. The experimenter preferably resorts to counterbalancing. when each subject 
receives More than one treatment (AB or 


7 iy all BA) and an attempt is made to distribute the fatigue and 
oman cn aa over the experimental conditions. In balancing, this is not the case 
because here each subject receives . 


only one experimental treatment and the effects of 


extraneous variables are balanced out by affecting the members of both the experimental group 
and the control group equally, 


is Sometimes confused. In fact, the 


Randomization 


Randomization is a very popular technique of controlling extraneous variables, Randomization 
refers to a technique in which each member of the population or universe has an equal and 
independent chance of being selected. Randomization is used where the experimenter assumes 
that some extaneous variables operate, but he Can't specify them and, therefore, can’t apply the 
other techniques of controlling extraneous variables. This technique is also applied where the 
extraneous variables are known but their effects can’t be controlled by known techniques. 

The importance of randomization lies in the fact that this technique randomly distributes the 
extraneous effects over the experimental and control conditions. Such balancing occurs whether 
or not the experimenter has identified certain extraneous variables, because the effects of 
unknown or unspecified extraneous variables are said to be equally distributed across different 
conditions of the experiment’when the experimenter randomly assigns subjects to the different 
groups or conditions. 


Suppose the experimenter wants to study the effect of lighting upon the reading ability of the 


subjects and for this, he tests two groups under two different conditions. The experimental group 
is tested in the morning conditions where subdued |i 


ghting is found and the control group is 
tested in afternoon conditions where bright light is found. The experimental group had been 
given training on how to read correctly, In such a situation it is expected that the reading ability 
score for the experimental group will be inferior to the reading ability score of the control group 
(because 


of subdued lighting) despite the training, But if each subject has an equal opportunity to 
serve in the morning or afternoon then on 


the average, the lighting will affect both the groups 
equally and confounding due to lighting conditions will be automatically eliminated. 

In this way, it is obvious that the extraneous variables can be controlled in several ways. Of 
these various ways, randomization, balancing and counterbalancing are relatively more popular. 


CONTROLLING DEMAND CHARACTERISTICS 


In behavioural study, sometimes the social and physical surroundings in the situation provides 
some Cues that essentially demand that we should behave in a certain way and not behave in 
some other way. In research, these cues are called demand characteristics. In other words, 
demand characteristics are defined as cues provided by research context that guides or biases a 
subject's behaviour in a particular direction. In fact, subjects try to depend upon these demand 
characteristics for answering questions like what are they supposed to do? What is really going on 
here? What meanings will be derived from my responses, etc. 


« 


i 7 i a 
edt’ nal Acnences 
fh asforPnis pry fe Aya ae 
2 Beer : 
ppg TES ee Age 


pe Adeutsté Si ae 
od aaa vd in terms of the following four Components oF 


: 4 sty 
+ explaine cme | 
aracteristics C4" be enF nt procedures and the experimenter. dh. 
sae ‘ 1” rene a 
Ents, (Meas 


‘ertain kinds of attitudes and pluses that influence the); 
g wan Tem ; athede best as instructed In the study, they are also on vigil aH 
, a cualey which may term sige aes cae page Psi: of hese 
serpin Se . Santon different subjects reac! C a | osed to do wherea ti SOM 
ae yee ‘cooperativ ely and do what they are sUppes ee aE 
subjects bend = 


say Gases x but intenti Ver 
| ‘na defensive way. They may not feel good Honally: ae being 
i. behave In « 
subjects may 
happy: 


Environment in which the 


Demand ch : 
eybpects. eM pron 
| ~ ects brin 

subjects | sinh 
behaviours. While subject 


tudv is carried out, also feeds such reactions among Subjeq 
sluaay ts : - : 
ts from the main task or cause them to react jn a ce 


» F 


Veriabls 489 


jl behave Ina natural manner, For providing little cues the expenmenter should not divulge 6 
- scific, purpose, should not include extraneous, distracting information and should avor 
spect 


I ™, ull Pa | 
traneous behaviours or comments and should hide the threatening equipment ane 

GAL =i 

apparatuses. 


second, the general strategy 16 lo neutralize the present cues to the extent it is possible. The 


searcher tries to be neither triendly nor unfriendly and keeps his position neutral. Thus the 
reste ade | , ' ‘aes . 
perimenter presents the task without implying that it is difficult or easy. He also does not 
expe | . CHEE at eacnes 
dicate what is a normal response. He also tries to neutralize the subject's fear and suspicions. 
iD 


The third general strategy is to seek experimental realism which refers to the extent to which 


iter ttain the experimental task engages subjects psychologically such that they become less een 

=e =p 2175 7 3 . f - a : ee eee 4 k . ah ime 

Such environment May distract ig d noises, changes in the level of illumination or broken peich with demand characteristics. For establishing experimental realism, the experimenter defines 

. - i: C re) 1 Z iF = : i : J s 4 = 5 

biased way: ase ih aly by different subjects and leads them erroneously to conclude independental variable and other related aspects of the experimental task in such a way that the 
ints are interpret iia iaadaanh Mer : ory in such situations. ‘a | 

ae wits is sin studied and how they should respond in such situ 

about what [5 DeIng ° 


_rament procedure also produces demand characteristics in light of Which Subjects 
sinianatal eeu? ne sxperiences during study. For example, in studies of imaging, Subjects ap 
tench a con : ceovicsly presented map and the time that they take in mentally travelin, 
requies eo rua measured. Generally, subjects take longer a time when locations on the 
to aa ie he pre: thev are nearer. This result may occur not because it reflects how 
aieaets tiavel along mental maps but because subjects know fully well that longer distances 
should take longer and therefore, they ‘oblige’ the experimenter. 


The experimenter is also one important source of demand characteristics during study, The 
experimenter generally dresses and behaves formally to seek maximum cooperation from the 
subjects. But such formality may inhibit many ot the subject’s normal reactions. Not only this, if 
the experimenter acts and dresses informally, it ‘may encourage inattentiveness and poor 
performance. Apart from this, subjects are very sensitive to the experimenter’s expectancies and 
therefore, any subtle cues about the response provided by the experimenter affects the behaviour 
of the subjects. Such expectancies occur because the experimenter, as researcher, knows about 
the prediction of the study and may inadvertently communicate them to the subjects, thus 
affecting their behaviour. 

In this way we find that there are several components of demand characteristics that, in fact, 
modify the subject's behaviour. What are the impacts of these demand characteristics? The 
obvious impact is that the subjects respond to these cues or characteristics rather than to respond 
solely to the treatment given in the condition. As a consequence, we lose both the internal and 
the external validity of the study. Internal validity is lost because the subjects behave in the 
predicted manner, making it appear that our manipulation influences their behaviour, whereas 
the reality is that the cues provided by the demand characteristics themselves make subjects 
behave in that way. Sometimes it may also happen that cues may cause subjects to behave 
contrary to the predictions, making it appear that the manipulation does not influence behaviour. 

cae RR ap laubleee an ei don't respond to manipulation and 
which the demand characteristics we absent ean rei tiger eke ae orictex ae 
reduce reliability because not all euibsjects react t he le patety, cineaile naraciorstes 
inconsistency occurs causing ailstarseree a ania eo the same way and then in this case, 
lance in the scores of the subjects. 
Now the question arises: 


How can we c 
question, the follow 


Tastes sian . ontrol these demand characteristics? To answer this 
| ing Measures are usually taken: 

(1) Genera! Strategies of contro] 

three general strategies for c 


for demand characteristics: 
researcher should prov 


3 Researchers have evolved 
: controlling demand charac 
Ide as fe 


subjects May find the task interesting and challenging. In this way, they will ‘forget’ about the 
demand characteristics and will be realistically engaged in the task. | 

(ii) Use of deception for controlling demand characteristics. Dec eption is another way ot 
controlling demand characteristics. Deception reters to the creation of an artifical situation oF a 
‘eover story’ for disguising the procedures of the study. In such a situation subjects are not aware 
of the independent variable being manipulated and then, they don’t tee! pressurized to reypone 
in a certain way. Deception is used when a straightforward presentation of a measurement 
procedure produces such strong demand characteristics that cannot be eliminated or neniializes 
in a proper way. Sometimes, the experimenters use demand characteristics to their Vania 
disguising a task by incorporating a procedure that may look scientific. For example, Duclos et : 
(1989) conducted an experiment in which they manipulated the posture adapted by the = Is 
in order to determine how posture influences their mood. In this study, ‘fake electrodes for 
measuring brain activity were attached to them and they were told that their posture was 
important for accurate and precise measurement. 


iii) Placebo to control demand characteristics: Placebo is also used to control the 
demand characteristics. If the design of the study is such that it uses both an experimental group 
as well as a control group, placebo is used to control the demand characteristics. What happens 
that if the experimental group is exposed to stimuli, tasks or instructions which are not presented 
to the control group then a confounding occurs: here the researcher ts not sure whether the 
experimental group behaves differently because of the treatment or because of the aeirianis 
characteristics that accompany or follow it. It can be explained through an example like this: 
Suppose the experimental group is given alcohol and the control group ts Not, Any impairment in 
behaviour that the experimental group demonstrates May be due to alcohol or it may be due to 
the fact that giving subjects alcohol implies that they will behave drunkenly and sO they do. Yo 
control such demand characteristics, the control group is given a placebo, that is, this group will 
also be given something that will be called alcohol whose smell and taste will be weriees saciid 
but in reality it will contain zero amount ot alcohol. In such a situation, wedi difference . 
behaviour between the control group and the experimental group can be anributed . the ree 
alcohol given to the experimental group. Thus we see that placebos are given to — eros - 
that they will experience the same demand characteristics as the Say SeNSINE groups zit 1a 
way, the effect of demand characteristics is equalized tor both groups and become controlled. 





It is a Well-known fact that the 
: . srimenter influen > gutcome of the study by way ot 
expectancies of the subject and the experimenter influence the 


teristics 7 | lw) Controlling expectancies of subject and experimenter. 
‘teristics, The first strategy is that the 


Ww Cues i he 
WES as possible, If the subjects have no cues. or fewer cunethae 





we in Behavioural Actences 
i ~earch Methods im i 
oeppeter t's and ese 


490 = Tests, Measun 


kinds of demand characteristics. For controlling the subjects’ ex 

, . | 1 : : : i ie 

u adopted in which subjects are unaware of the specific conn Ml 

ontrolling experimenter’s expectations, double-bling p iho 
a i: i: 


CONditig,, " 1 


introducing varlo ane 
single-blind procedure |s 
re ivi ikewise, forc 

receiving. Likewise, experiment . 
wes? a which both the researcher and the subjects are unaware of the specific 
used in 


presented. The original experimenter tral ns other — to actually con duct the ‘ by 
thev don’t know and that way remain blind to the ort ton a subject is in and to thes 4 
predictions of the study. Suppose for example, the 5, amelie Interested jn knowine® 
influence of a drug or medicine, If he knows to which group of subjects a placebo r wi 
is being administered, he may directly or indirectly communicate his expectations to ther 
such treatment to the subjects. These expectations alone can produce various kinds ofa 
reactions, etc. However, if the experimenter does not know or is blind to such informe sg 
expectations will be communicated to the subjects that may influence the study's outcome, hy 
tics Which 














i 
xe 


5 


T Actual, 


onl 
fe 
k 
le. 


Thus we find that there are several ways of controlling demand characterj< 
remain uncontrolled, do influence the outcome of the study seriously. 


Vv Review © 
1. Discuss the various 
psychological research, 


kinds of variables and indicate their 


Mportance jp 


bh 


State the importance of control in an experimental study of behaviour. p 
illustrations the different techniques that 
extraneous variables. 


escribe wap 


have been devised for contre 


i 








3. Differentiate between constructs and variables. Giving suitable examples, elucidate Al 
different types of variables. : “ma 
* is . cs = ‘a . a = E : IY 
4. Discuss the important considerations in selection of variables. Illustrate through | 
examples the importance of balancing and counterbalancing in the control of | 
extraneous variables. af | 
>. Discuss, with examples, the various ways of manipulating independent variables | 
6, Discuss the various methods of measuring dependent variables 
7. Describe the various techniques of c 3 
"S chniques of controlling dema “haracteristics in 4 a 
8 B nd characteristics in an experiment, — 


Write short notes on the following: 


(a) Type-E and Type-S independent variable 
(b) Demand characteristics 


(¢) Extraneous variable 





21 
RESEARCH DESIGN 


SaaS ee 


7 CHAPTER PREVIEW 
Meaning and Purpose of Research Design 





. J . 
Experimental Variance 
Extraneous Variance or Control Variance 
Error Variance 

e Criteria of Research Design 


Capability to Answer Research Questions Adequately 
Control of Variables 
Generalizability 
. Basic Principles of Experimental Design 
Replication 
Randomization 
Locus Control 
Basic Terms Used in Experimental Design 


e : 
e Some Important Types of Research Design 
e Between-Subjects Design 


Two-Randomized-Groups Design 
More-Than-Two-Randomized-Groups Design 
Matched-Groups Design 
Which Design to Prefer: Randomized-Groups or Matched-Groups 
Factorial Design 
e Problem of Creating Equivalent Groups in Between-Subjects Design 
e Within-Subjects design 
e Problem of Controlling Sequence Effects in Between-Subjects Design 
e Comparison of Between-Subjects Design and Within-Subjects Design 
e Experimental Design Based upon the Campbell and Stanley Classitication 
e Pre-Experimental Design (Nondesigns) 
One-Shot Case Study 
One-Group Pretest-Posttest Design | 
Static-Group Comparison (or Intact-Group Comparison) 
e ‘True Experimental Designs 
Posttest only, Equivalent Group Design 
Pretest—Posttest Control Group Design 
Solamon Four-Group Design 
Latin Square Design 
e Quasi-Experimental Designs 
Vime-Series Designs 
Equivalent Time-Samples Design — 
Nonequivalent Control Group Design 
Counterbalanced Design 
Separate Sample Pretest—Posttest Design 


491 








= 


9 hanioural Sctences 
WW iy Behariowre 
al Research Methods mf 
a pennents ai : 
ope Afeusire 
4! F4 Pes 


jan 
patched-up Desig A 
Longitudinal De siB = 
Cross-sectional Desig 
Cohort Design 
Ex Post Facto Design 
. 2 Is e. 
Correlational Design’ 
Criterion-Group Design 


» = Steps in Experimentation 
2 jh 


the investigation. In fact, it is the blueprint Of the qjcae 
Research design is the detailed panerte mavens the obtained data. Research design 
procedures of ra the kere thhosesiene taken ahead ot time to ensure that the relate 
may be defined ; ie rt way that permits objective analysis of the different hypothe. | 
oe th : > ase to the research problems. Thus, research design helps the researcher jp | 
in vi he i by reaching valid and objective conclusions regarding the relationship | 
ee pel iacerate “a dependent variables. The selection of any research design is abyioud, 
eH aa the whims of the researcher, rather it is based upon the PUTPOse Of the 
aah san agg types of variables and the conditions in which the research is to be con duce 
The purpose of any research design is to provide a maximum amount of informatio 


to the problem under investigation at a minimum cost. Basically, 
two tunctions. 















nN releya 
a research design Serviec 
First, iL answers the research questions as objectively, val 
In fact, this is an important function served by any research 


usually epitomized by the hypotheses. A research design suggests to the researcher how t 
collect data for testing these hypotheses, which variables should be treated as control Variables 
what methods of manipulation will be more adequate in a particular context, what ¢ 


statistical analyses should be done and finally a possible answer to the research problems. Thusa | 
research design, after moving through the sequence of different related steps, enables the 
researcher to draw a valid and objective answer to research problems. | 

Second, a research design also acts as a control mechanism. 


researcher to contral unwanted variances. In any scientific 
common Variances, nam 


ely, the experimental vari 
variance with which the researcher 
presented below, 


idly and economically as possib 
design. The research problems ar 


In other words, it enables the 

investigation there are three types of 

ance, the extraneous variance and the error 
is directly concerned. A discussion of these variances is 

Experimental Variance 

Experimental variance js Produced in the de 

Experimental 


Variables by th 
: ts to maximize th 

objectve data as IS possible. F 
designs the experiment in a way 
- Suppose the 
by the 


pendent variable by the manipulation of the: 
vy he experimenter. Usually, the experimenter or the 
© experimental variances <q that he can get as valid and 
Or Maximizing the experimental variance, the investigator 
In which the differ 


; : ent experimental conditions become as 
ei experimenter is Interested in studying how the rate of learning 
itferential reward. This simple 7 


os experiment may be conducted by 
n ABE, Sex, intelligence, €tc., fall these here constitute 
-€ experimental Conditions: high reward, medium 
ach experimental condition have learnt the task, 
y “ay, the trials taken to reach the criterion. Fewer 
le higher speed of learning the task. If it comes out 


different as possible 
a task is influenced 
randomly placin 
Examples of 


€d in lerms 
On May indica 


Research Destin 494 


sndition for 

dition for the high reward is producing a higher rate of learning ane the Sgicenaliainel 
that the mes : ; is producing a poor rate of learning, the experimental Yale 5 nick all thie 
the low he time being that the experimenter designs the experiment in a ee ichvor lene 
SupPoOe s are nearly equal (say, for example, all three conditions are eit les Oa eeciite 
three ala ns), the experimental variance will be at minimurns because eek Aithereece 
reward conde cted in the learning rate by the three conditions. Thus the higness bes pendent 
changes Seecciraprkenetal conditions, the higher the experimental variance. The exp 
eames ae peairnins the experimental variance. 
alway: 


us Variance or Control Variance | penne ialnaiees 
a. iance or control variance is produced by the extraneous variables ico wane 
Extraneous espe» i always tries to control the relevant variables and thus, i Si Of 
variates tre Varin produced by these variables. For elimination of extraneous ¥ 
eliminate the 


| ich 
le ive such ways whic 
ial that the extraneous variables be properly controlled. There are five 
1 entia 1 J 5 . 
have been discussed in Chapter 20. 


Error Variance 


tii , variance is defined 

hird function of a research design is to minimize reseed pgp gr caltelh et tine (sckons Hol 

The thir ariabilities in the measures which occur as a ear ee iflerences 

: lances or variable: he individual differs 

=” uo ble by the experimenter. Such factors may be related to ‘auadt ability etc., or they 

sl hh subjects themselves such as to their attitude, motivation, = oh =e thé Aterantes in 

sini I ted to what is commonly called the errors of measurements a state of the subject, 
helt o oaces in conditions of experiment, eine ps i. “ai 
trials, aiiterer a's; f r variance, it has three di satures. 

a : source of error va ; SRR a er Te 
‘stiguability, etc. Whatever the : ine te ORE al 
“oe : a variance is self-compensating because sometimes the variability is pe 

. Err | 
‘ 5 , 1 - 7 ; | 
1etimes negative. ; . ance| out in severa 
— Since error variance is both positive and negative, it tends to ca 


_ - i *. J : i STE hs t e L 


© it is based andom errors. In this 
3. Error variance is unpredictable probably because it is based on ra 


= di and 
iance t the latter is predictable ar 
or variance is distinct from systematic variance because the latt p 
Way, erro et 
based upon systematic errors. 


4 ; | =, First, it improves the 
Lx: mien ese en ie aa iia: audible and accurate. Thus it 
liability of measures so that the generalization can be more depen . Sate om Regier a 
seengthens _ external seen api sig vt ae eal ANOVA, usually 
| We clanifie if it is really significant, If the arte ‘aa Pah ean: ee 
ance to show its significance, if i ) | Se ada Auer) will we 
apn group arenes is large, the systematic variance (or betwee 
the within-gere = ‘ 
significant. = : peak e conditions of the 
The error watlance can: be usually minimizes by cclly-eantelles the occurrence of 
| = or ry | a | Pe ! 
. oo iti of an experiment are fi : en 
‘riment. If the different conditions ei lowing many uncon 
One: ysl onee is minimized to a great extent. In an experiment allowing ¥ 
ar ‘* | sto hi oportions. 
conditions, the error variance accumulates to higher pr J research design is to provide 
Finally, it may be added that one of the objectives of a iat ae bea tester ‘confounding’ 
unambiguous results and to avoid contouncing catgeiews ae variables have changed 
5 ease 2 | situation where ‘hited ts 
state in an experimenta ai = Ires cannot be attribute 
sled be = ra at the resulting change in dependent variety ok ices. Warneness in stating a 
a dis v bl : Obviously, confounding of variables me Neg fei cereal dae oh 
, + ‘Gariabbles: sly, ce Set ban | a good 
ae afheck (or independent-dependent variable) pail ip a 
ause—effect (or | Ae smuinteovreline:- cok Variables: 
design also purports to avoid such confounding o 





Measurements and Research Methods in Behavioural Sciences 
( 









494  Zests, 
CRITERIA OF RESEARCH DESIGN 


As we know, research design is data discipline. There are various types of rese 
are weak designs while some are strong. Behavioural researchers have been gb| 
es © tO fg 


certain criteria on the basis of which a weak research design can readily be distingui fal 
ABUSE fae 


arch desipng oy) 


strong one. These criteria have proved very useful in guiding the researchers ig 
( r ne a & In i € “il ‘ie 
Fin. 


direction. These criteria are mentioned below. 
1. Capability to answer research questions adequately 


2. Control of variables 

3. Generalizability 

A discussion of these research designs follows. 
Capability to Answer Research Questions Adequately 
A good research design is the design that answers research questions adequately. Sometj 
researcher selects a design which is not appropriate for answering the research questiog I 
Such designs constitute an example of weak research design. Such a design does ri ee ¥. 
test the hypothesis either. It is a common practice that students, while trying to shiviacall " 


question by conducting experiment or doing research, often match sex, age and intelli 

the subjects on the assumption that such matching would lead to the setting Pe 
experimental group and control group. The reality is that if there is no relation between cout 
and the dependent variable then matching an age will be irrelevant. Therefore, any design v ’ 


van | 


upon matching would be a weak design. 

Likewise, if the research problem is such where setting of four groups, namely, three— 
experimental groups and one control group or four experimental groups are necessary bus | 
researcher takes only two groups and utilizes a two-group randomized design, this will again baal 
an example of a design which will not answer the research questions adequately. Likewise, for | 
testing interaction hypothesis a factorial design would be an appropriate one, wheteds aly 


(wo-proup randomized design would be an example of an inappropriate design, 


Control of Variables 
Another criterion of a good research design is that it should contro! the efiects of extraneous 
variables which are more or less similar to independent variables that have the capacity to 
influence dependent variables. If left uncontrolled, such variables are called independent 
extraneous variables or simply extraneous variables. A design which fails to control the effect of 
extraneous variables is considered a weak one and the researcher should avoid such a design. 
Be are various ways to control the effects of extraneous variables. Of these ways 
ra nm ng is considered by many as one of the best techniques of controlling the extraneous 
var . The th asic phases i : 
non es, There are three basic phases in randomization—random selection of subjects, random 
ne - subjects into control and experimental groups and random assignment 
ee. ie different groups. Sometimes, it h appens that it is not possible for 
‘searcher to make a random selecti Jeon cea a Ma a ah ; 
election of subjects. In such situations, the researcher tries (0 


randomly assign the selected subjects j “ig 
y assign the selected subjects into different experimental groups. When this random 
reason, the researcher randomly assigns the different 


assignment is not possible due to any 
lreatments < l 
's among experimental groups. Whatever the method may be, 


has proved aie its : 
A Rory uaetul tn controlling the extraneous variables. A design whit 
Thi 


experimental 
0 be the best design for the research. This 


randomization 
fully © } 
lly controls the extraneous variable js considered t 


increases the internal validy of the research 


Research Design 495 


Generalizability 
the third criterion of a research design is generalizability. Generalizability is th I validi 
of the research. In other words, it refers to the extent to which the result of ‘soli: aca 
research obtained can be generalized to subjects, groups or conditions not ich pais le 
of the research. lt ve design is such that the obtained results can be generalized 6 | cent 
or subjects, the design is considered to be a good one. In fact, how much the sn 
generalize ite results obtained Is a Complex question and it is concerned not only with technical 
matters of the research like sampling and research design but is also concerned with the larger 
‘oblems of basic and applied research. in basic or fundamental research, the chien’ o 
generalizability s not ser ious because this is not the first consideration. Here the seule is 
basically concerned with examining the relation among variables and exploring why the 
variables are so related. But in applied research, the researcher's main concern is with 
seneralizability because the researcher intends to apply the obtained results to other persons, 


conditions or situations. 
These are the basic criteria of distinguishing a good research design from a weak 


research design. 
BASIC PRINCIPLES OF EXPERIMENTAL DESIGN 

Having discussed the different types of research design, we are now in a better position to 
concentrate upon the basic principles of experimental investigation. The base of all experimental 
investigation is the experimental design, which is one of the types of research design. An 
sae heomnine taney anne defined as simply a sequence of steps (taken ahead of time), which 


permit the objective analysis of objective data in a way that a definite cause-effect relationship 
can be interred between the independent variable and the dependent variable. As earlier 
discussed, an independent variable is defined as the variable, which is manipulated by the 

selection, so that its effects can be studied upon the 
e we can change it independently. A 
measure, which is affected by the 


experimenter either directly or through 
d because responses depend 


behavioural measure. The variable is so named becaus 
dependent variable is the response variable or the behavioural 
manipulation of the independent variable. The variable is so name 
upon those changes or manipulations. Suppose, for example, the experimenter is interested in 
studying the effect of the intensity of illumination upon retinal fatigue. Obviously, in this situation 
the experimenter would change the levels of intensity of illumination and would see how long 
ect is able to read under each level. In this example, the intensity of illumination |s the 
is the dependent variable. 
inciples. According to Ostle & Mensing 


the subj 
replication, 


independent variable and retinal tatigue 
A sound experimental design Is based upon some pr j 
(1975, 260), there are three basic principles of an experimental design: 
randomization and local control. 
on of two words, namely, duplication and repetition. It refers 
riment, using a nearly identical procedure with a different 
1971, 391) has similarly said, 


Replication 
data different time. Winer ( - 
arly identical conditions 


The term ‘replication’ is really a tusi 
to the deliberate repetition of an expe 
set of subjects, in a different setting an bi 
“A replication of an experiment is an independent rep etition under as e +s a person in 
as the nature of the experimental material will permit. epee aim Pah ifs 
revalidating a previous study of raises some questions ie when wee ne aie 
replication provides a very accurale estimate o! Ne a ‘ is d differences. The term 
unit of measurement in evaluating the significance 0! ne Seer if d en, Henice.. it 
ntly used in the literature Of Exper ere ieee enfar, thy 
he errors OCCUITINE due to faulty 


‘experimental error’ is very freque | 
yr refers tol 
needs a more detailed explanation. Experimental error re 


il § 7 fi ‘ why 7 
1! apn Li hi nil i i t ce 
‘i i . 


ah anna pera ' | 
ages sii biased observation, “ACONTOllod 

; ity mee” ‘ the uncontrolled extraneous Valrtables ala 
tw ub nae aI he coniused with multiple Measure; 
ne 


of intensity Of light and experiment. 


| 
4% 


Ne | | 
Ne “od 


MS, ge 
NS efi Pag 
ws 


: 
fe 
* CASE of ep, ‘ 
Mig, 


t , gue _ TV q 
; ‘ a quiltiptee | 
F vi principle of experimental nen It mak 
is the ttc rest used in an experimenta’ situation depende yy. 
=ly chal Tsits. : = in 
cach saat nt anel common assumption is that the obser Ban at 
fwhich oc of the observations wn when 1 ; tlons, “hy 
oat The INGEPEN * ects are randomly assj Samp). 
et, h copulation of the ss aaelee f i 1 ened to the ex le 
angomiy araw? from t a ‘an ensures the indepen Senge me observations whine 4 
rand ses id, Not only this, when subjects are randomly a ICh, ing 8 
“ fil. ! i : ; MW 
ee : the experimental treatments are randomly assigned to hel 
ans : ‘ \ “4 H “ \ ee 
ns the extraneous variables, which, otherwise, are left UuNcontrofeg 


onic ; ey bag 1 . : Be 
te like an ‘insurance which is “always a good idea, and Pee ae 
han we exped that con. 


randomization (5 Ostle & Mensing 1975). Sometimes it has been 

me difficult. This is especially true when one is dealing With pg. Oe 
zanion : tan the experimenter should not insist on complete randomj-. nis, 
vanables. In such on complete randomization and complete nontando wa al 

3 muddle sai on he followed. A very encouraging suggestion has been made Le al 
vom 4): “There are situations in en pie randomization fet 
impossible of uneconomical.The statistician should Ot erefore, adopt the Unrelgce 
vexitian of insisting on complete randomization in every Case. On the other hand, neither Bi 
he apree to the use of 2 completely systematic design ... Some intermediate Position . eae | 
most realistic.” Sof | 
Local Control | 
By local control we mean the amount of balancing, blocking and grouping of the subjects ort | 
experimental units employed in an experimental design. Thus local control is, in — 
regarded as 2 pan of the entire edifice of the experimental design. These three terms Pic 
balancing. ‘blocking’ and ‘grouping’ need further explanation. The term siupie ee ney 
to define. it reters to the assignment of homogeneous subjects or experimental units intog me | 
— on pours of homogeneous subjects may be available for differential exper | 
» Blocking reters to the assignment of experimental units to different blocks in sucha 


mariner that the assipned experimental units withi 
eet *IBNED expe nits within a block ma . hoe 
an experimental design refers to the fact that y be homogeneous. Balancingin 


ak a rouping, blocking and assignment of aon 
Units to the diffe | sfouping g and assignment of experimental 
be a balanced one. A desig : pee done in such a way that the resulting desion annem 
property of local contro! Statistically and experimentally sound must possess the 


¥ iol { ; 4 . 
fer another. This is obviously poy 
: a & i % Se 14 t Pr 
il! ” = Pa | ‘a ‘ biel qe 4 


| 















ey 
atv Tat all 


















EXPERIMENTAL DESIGN 

lerminolopies These | “aa Gelined se of te 

° \OlOB ies are dice 

a) Factor: rae Is 
Basis * WACO 15 a var- 

inlerchange ) Yafiable whic ; | 

fable, whieh experimental varial| i , rred (0 

: ‘WHICH is maninutate a! Variable, A factor is also referred 10% 

‘ ‘UpPOse | rare the experimenter. This is called as experiment 

IMance of 3 S€NSOFY moto “F Manipulates the intensity of sound to examitt 

Task. Here sound is an experimental variableand 


rminolopy. It is essent ial to pet acquainted with these 
ussed below. : : 


11s de 





zw 


Me eared fewer yy 


d E-type of factor ep ulated dir 

i afi i ie 1c iwi , f ore ¥ nanipulater Chines ily 
in he experimenter, Besides this, there may he Stype of factor, which is 
+ tion, Inv fact, these are subject-related variables, which can't he di 

GMs \ 
ue imenter. FOr example, if the experimenter is interes 
erin 
en yer 


ycquity, he may manipulate the age by selecting st 
# 


Manipulated hy 
rectly manipulated by the 
ted in Wudying the Impact of age on 


injects On ier ; ae a eae 
visual ad by selection, it is generall jects OF aiflerent age levels, When a 
jable is manipulatec by seleciion, i ls penerally called as a classification factor, In factor 
wi lated with each other and not with others, each Clurnp (group of variables 


-orre 1) 18 called a factor, 
a Factors in an experimental design are usually denoted with capital letters such as A B.C.D 
i 3 ’ Fa ' 


5. For example, in the experiment trying to examine the impact of levels of illuminance 
ae upon performance of letter cancellation task, level of illuminance may 
ness and level of noise as B factor. 

: 


(b) Levels: By level is meant each specific variation in a factor. For example, the factor 
aynd may consist of three levels: 40 dg, 60 dp and 80 dy. The experimenter tends to choose the 
sou ber of levels of a factor in either of the two ways: by adopting some systematic. nonrandom 
eure and by random procedure. Some factors may have infinite number of potential levels 
ich as intensity of sound) and others may have only a few (from example SES, sex of the 
subject). In case the experimenter decides to select p levels trom potential Plevels by using some 
systematic, nonrandom procedure, this factor is considered tobe a fixed factor. This is known as 
fed effect model where the chosen levels are the ones in which we are interested and the 
experimenter does not wish to generalize beyond these levels of the factors. Thus if one wants to 
replicate the experiment, the same set of treatments/levels will be included in each replication 
and the drawn conclusions are restricted to those particular experimental treatments or levels. 


in contrast to the systematic and nonrandom procedure, the experimenter may decide to 
include the p levels from potential P levels through random procedure. In such a situation, this is 
considered as random factor and this is called as random effect model or variance component 
model, which is rarely used. This model requires that the population of levels should be detined 
and then, selection of several levels be made randomly. In behavioural sciences, this model 


seems unrealistic because here the experimenters deliberately choose certain levels of factors for 
their experiment. 


ane 
be designated as A 


(c) Treatment combinations: Treatment loosely means subjecting some person or 
something to some action. In experimental design, it refers to a particular experimental 
manipulation or procedure under which participants are run. For example, in a 2 x3 factorial 
experiment the participants are assigned 6 treatments. In the present text, the term treatment and 
treatment combinations will be used interchangeably. In a single-iactor experiment, the levels ot 
factor themselves constitute treatments. For example, suppose the experimenter is studying the 
impact of the level of illumination upon reading ability and he decides to have four levels of 
illumination. Here there will be four treatments and each participant will be randomly assigned 
to each oj these three treatments if a randomized group design is followed. In multiple-factorial 
experiment such as 3 x3 x4, there will be 36 treatment combinations and each participant will be 
randomly assigned to one of these 36 treatment combinations. 

id) Dimensions: The dimensions of factorial experiment are indexed by the number of 
levels of each factor and the number of factors. For example, factorial experiment in which there 
are four factors, first having 2 levels, second having 3 levels, third having 3 levels and four having 
lour levels, is called as2 x3 x3 x4 (read as two by three by three by four) factorial experiment. The 
dimension of this experiment is 2x3 x3 x4. 

le) Main effects: Main effects refer to the difference in performance from one level to 
another for a particular factor, averaged over the other factors. In factorial experiment, the mean 
squares (MS) for the levels of factors are known as the main effect of the factor. Let us illustrate the 


in peharionral Science 


4 i Fi : r spl 
gg Tests , le. Suppose the pele dis ite nned ; 3 
' hrough an a hwo levels, factor dealing: factor 6. *4, 
, factor AN 


res for A corresponds + Nace’! 
of squares for Oa Sf 
id ~ - levels. | | ‘ 5 q 
jactotial €% _ f squares toa comps dt: 18,8 "ky 
als an 4 the Bsum ° i een levels ot Cy, Cy Cy and C, and the aNd 9 
leve "revels A, and 2 arison between ” The difference in perfo; SUny 
between lev aceite a compe nN D, and Ds. e : OrMance i) yf 
the C sum of S40 jon between DU» ic of B, Cand Dis called as the main effect o bey . 


quares to a compe ver the leve : 
squares © yeraged Ove" he difference in ‘age 2 And 9° 
Cand Dis called main effect of B, and so on. Thee 


f : 
performance among levels, B,, A. Thic: 


: his je.) 
also mee the’ evels ol factors A, Sis = 
averag a " : 7 Ly - La 
called as B-effect. the main effect 1s the curve joining points repres ENting. 


When graphically ee ticular factors averaged over other factors of the ex rien 

sumone s me have significant slope, that is, the cee will not be paralle| to age A 
significant main ai a factorial experiment, the effect of a treatment on one face Is, 

i) Simple effects: factor is known as the simple effect. Let us take an example of 4 ata 

vel of the other ch both factors A and B have two levels each (A, A,, B, B.). The 2x) 

s of Aunder each of the two levels of B is called simpl 

a two levels of B under each of the two levels 


given le ae 
factorial experiment IN W 
of treatment on two level | 
Likewise, the effect of treatment 0 
fect of B. 
example, the simple effect of Ais the profile where | iB fe olech are marke 
and the two curves represented by the levels of B, and B, are t 'e simple elects at each of the th 
levels. Likewise, the simple effect of B is the profile where the levels of factor B are Marked op 
y-axis and the two curves represented by A, and A, are the simple effects at each level, 

(g) Interaction effect: Interaction effect is an important effect in any factorial design 
because it allows the experimenter to investigate the interaction among the independent 
variables, An effect of two independent variables operating simultaneously and in Combination 
on a dependent variable is called as interaction effect. In fact, it is a larger effect than the effect 
that occurs from the sum of each independent variable working separately. 

Interaction is said to have occurred when change in 
change in the effects of the other. An interaction effect is an effect in which the impact of one 
ci: aman yer is different across the levels of other variable (Aron, Aron & 
factor depends on ltabc ae ‘san interaction between two factors if the effect of one 

cond factor. When two factors are shown as A and B, the 


interaction is identified as the in : 
ser AxBinteraction. Let us take an example to illustrate the interaction 


o Suppose the €xperimenter tested a group of 
; were administered drug at two months of a 
™* @€ 0110 months, There was also 
bi aB¢ Or 10 months of 
UPON age Ithe level of car 

Interaction, sane 


e effect of 
of Ais cal 


ction, Fo, 
don X-ayis 


one value of one variable brings a 


some rats (N = 25)on maze learning task. Some 
ge and some other rats were administered drug at 


SO 4 Group of rats that were administered n 
age. Now the question 


factor)? The hypothetic 


tha o-drug at either 2 
's: Does this drug effect (one factor) depend 
al data are presented below for explaining 


Overall 
X= 80 
X=50 





Research Design 499 


100; 
Drug 
— 
60 No-drug 
a 
40 







0 2 10 x 
Months Months 
Fig. 21.1 


- B 





Overall 
| x=50_ | xX=110 | X=80 
X=50_ X=60 
120} Dig 
1001 
20: , 
No-drug 
- 6 
g | 
w) 40+ 
| 
a a ee 
0 2 10 x 
Months Months 


Fig. 21.2 


. 
| 2month_| 10month | Overall 









bug __[x=70 | Kaa] Xess 
Nodhig [_k=40 _ 
Ditterence 3 30 
100+ 
a No-drug 
nas, 
g 60 Sl 
5 | an oS, Drug 
0 2 0  ¥x 





1 OME OC Pe FIs . 
«and Research Methods if BeDAVIONTAL oCIEnce 

f ! /% 
; Measuremen 
500 Tests, 3 


in A, it will be obvious that such 
| h presented in A, it will be oby such da 
If you look at data and grap P in effect for the drug in this set of data, th 7 PreSen), 
ion. If you pay attention to main : ferek OVera|| a 
interaction. IT y he drug group is 80 and for no-drug group is 50. This means that g Svea 
mean score all sffect, Does this drug effect depends upon age (levels of the secony hay, 
30-point = to look at each group separately. For animals tested at 2 months, the dh acy, | 
Now it ‘ints (70-40) and for animals tested at 10 months, the drug effect j, sil Seg | 
exactly 30 po | 2 effect does not depend upon age. Therefore, there ee: O Doin. 
0-60). It means that drug eff 0 inter, oe 
: - the effect of one treatment remains constant across all levels ot other factor show: 
‘nteraction), there will be a constant distance between two lines in the graph. Graph of suche 
will have parallel lines. - . da, | 
Now look at the data and graph presented in B. Again the overal| drug effect js 20 Points, | 
is, X = 80 for drug group and X = 60 for no-drug group of animals). But when YOU look at wae | 
separately, you find a 40-point drug effect for animals given drug at 10 months Of ape ie e 
such drug effect was zero at 2 months of age. It means that the effect of drug does depeng a | 
the age of animals and therefore, there is an interaction. Here lines of the graph are not baal | 
rather they converge at a point. | 


Now look at the data and graph presented in C. For these data, 
(55-55). Both the drug and no-drug groups show an overall mean 
drug has no effect? The answer is no. The drug produces 30 point increase in mean Scone: 
young animals and 30 points decrease in the mean score for old animals. Thus the effect of hy 
drug depends upon which age-group is being examined. |t means that there 
Lines in the graph cross each other providing evidence for interaction, 

One point of caution is that when there is an interaction effect, the main effe 
misleading. Therefore, whenever an interaction is present, one must 
factor in terms of its interaction with the other factor, 


the overall drug effect is 
of 55. Does this Mean that th 


IS an interaction, 


| Cts may be 
interpret the effects of each 


SOME IMPORTANT TYPES OF RESEARCH DESIGN 


| 
As stated earlier, the design of an experiment is the blueprint of the procedures that enables the 
experimenter to test various hypotheses, It is. therefore, essential that the experimenter at the 
outset must decide what type of design he will frame for testing the hypotheses, The selection ofa 
design is partly dependent upon the experimenter’s decision regarding whether he is going to use 
one or More than one group of subjects in the proposed experiment. If he decides that there wil 
be only one group of subjects who will be tested under different values (or conditions or 
treatments) of the independent variable, the resulting experimental design is known as the 
within-subjects design or repeated treatment designs. \f, on the other hand, he decides to usea 
mes aoe he =) value of the independent Variable, the resulting design is referred to a5 
i iio aa ra i & Roediger 1984). Thus based upon the criterion of the 
roto fh ps used in the experiment, there are two types of research designs— 
en-subjects design and within-subjects design. 
In psychological and educational research 
more frequently than the within-subjects 


es, the between-subjects design has been used 
of designs, we shall discuss the main su 


design. Before we go into the details of these two types 
divided into three most Dtypes of these designs. The between-subjects design 6 

factorial design. A cm common types: randomized-groups design, matched-groups design and 

the different grou sm — &d- groups design is one in which subjects are randomly assigned (0 

> VUps Meant for the different conditions or values of the independent variable. It 6 
PS Statistical| andom assignment of subjects into two or more groups wil | 

oe ject relevant variables (attitude, ability) 

t variable. When the subjects ale 


Research Design 501 
randomly assigned to only WO groups, the resulting experimental design is known as 
two-randomized-groups design and when the subjects are randomly assigned to more than two 
groups, the resulting design ls known as the multi-groups design or- more-than-two- 
randomized-groups design, Matched-groups design is a type of between-subjects design in 
which the subjects are matched depending upon mean, standard deviation, pairs, etc. Likewise, 
3 factorial design, which is also a type ot between-groups design, may be defined as the design in 
which two or more than two independent variables are studied in all possible combinations 
(each combination having a separate group of equal or unequal subjects) so that their 
independent and interactive effects may be studied on the dependent variable. 

Within-subjects design is of two types—complete within-subjects design and incomplete 
within-subjects design. A complete within-subjects design is one in which practice effects are 
balanced by administering the conditions several times to each subject, in different orders each 
time, such that the results for each subject become interpretable. An incomplete within-subjects 
design is one in which each condition is administered to each subject only once while varying 
the order of administering the conditions across subjects in such a way that practice effects can 
be neutralized when the results for all subjects are combined. The common techniques for 
balancing practice effects in complete within-subjects design are block randomization and ABBA 
counterbalancing while in incomplete within-subjects design, one common technique is to use all 


possible orders of the treatments and subsequent assignment of each subject to one of the orders. 


For easy understanding of the above designs with the entire picture at a glance, Figure 21.1 
has been given which arranges them into a schematic presentation. 


Experimental Design 


Between-subjects 


| Within-subjects 
design design 
| | | 
Randomized-groups  Matcned-groups Factorial Complete Incomplete 
design design design 

| 

| | 
Two-randomized-groups §More-than-two randomized-groups 


design design 


Fig. 21.1 Schematic representation of experimental design 
A critical evaluation of each of these designs is presented below, 
BETWEEN-SUBJECTS DESIGN 


Two-Randomized-Groups Design 


Atwo- randomized-groups design is so called because here the subjects are randomly assigned to 
{Wo groups only. In formulating this type of design, the experimenter first defines the independent 
and the dependent variables. Subsequently, he selects two values of the independent variable. 
These two values may also be called ‘conditions’ or ‘treatments’ of the experiment. His main 
interest is to examine whether or not these two conditions affect the dependent variable in a 
differential way. In selecting a sample for the proposed experiment, he defines the population. 
Suppose the experimenter wants to study the effect of reward upon the rate of learning verbal 
Concepts among kindergarten children. For this, he specifies the population which consists of all 
indergarten children ina particular district or state. He wants to make a generalization about 
this population, Further, suppose that this population consists of 5,000 children in a district. He 
May randomly select a sample of 100 children. These subjects may be regarded as 





fs. t 74, t 
502 aC 


sn, in this experiment the experimenter may Wish to ha 
vcontatives of the population. cence of the reward or rewarded condition, And absenn, ” 
peo ar conditions ol ews lg RR the 100 subjects will be divided int “8 9 
ne eared or nonreared cond ue subjects into WO BrOUps. Fora he 
groups. The ce arcemerene on separate slips of equal size and same colour, or he may Numbe 
write the names of Qc hee 100 slips may be folded in like manner and placed in a bOX for 
them on a separate slip. rer may decide that the first slip will go to the first group, the SECON da 
veshulile. The experimen : \ i Oe Ti group, the iourth slip to the second BTOUP, and 55 Ip 
to the second group, thet a : - two equal N groups of 50 each. Now, just by a toss Of aco, 
This process will fatally can which group will be named as the experimental or...’ 
the experimenter May agan neni: eet group (nonrewarded group). It is expecta dt 
rewarded seven ci will not differ singnificantly at the start of the experime : 
The exgerirental group is given one type of tefatinent ane te Control BrOUP is Biven anothy 
type of treatment. In the above example the children of the renee group will be rewarded 
for learning the verbal concepts, whereas the children of the contro Ser will not be rewarded 
for leaning the same task. The nature of the reward will depend upon the predetermination Of the 
experimenter—it may be verbal or monetary. Subsequently, the scores of all subjects of these two 
groups on the dependent variable (learning of verbal concepts) will be recorded and subjected to 
statistical analysis, Usually the ttest or its nonparametric substitute, the Mann-Whitney U/ test, is 
applied in a two-randomized-groups design. {A detailed discussion of these Statistical tests is 
done in Chapter 23.) The experiment may be continued for several days or weeks, If the statistical 
test reveals that these two groups significantly differ on the measure of the dependent variable, it 
is concluded that the difference in the dependent variable is due to the experimental 
manipulation of the independent variable. If, for example, the experimental Broup is 5 units 
ahead in learning the verbal concepts and this difference is statistically significant, we conclude 
that reward accelerates the learning of verbal concepts in kindergarten children, | 

Belore concluding the discussion of the two-ra ndomized-groups design, 
here. As it is obvious from the above interpretation, the heart of the two- 
design is a random assignment of subjects into groups so that there will n 
relationship between any of the characteristics of the subjects and the 
they are radomly assigned. The question is haw to achieve this end 
Suggested two primary ways through which unbiased Sroups or random 
lormed: captive assignment and sequential assignment. 

The technique of captive assignment is one in which all sub 
He experimenter by name and they are all present at one time. In 
a of the experiment so that they can at random be assigned to the different 

| Bfoups. That is why the technique is known as a captive ace th 
EXPeriMenter is to use all Subjects from a single c| his b ( ores i bles i 
in the captive assignment. Several vibe - est illustrates the situation prevailing 
captive subjects into different groups, :, . an be adapted for randomly assigning the 
wants to conduct a two Ppose there are 50 students in a class. The experimenter 


“Condit imo : 
[WO groups of prelerably equal nreriment He may, then, randomly assign these subjects into 
Number of subjec -Itis not ized-groups design, the 


: Necessary that ina rando 
: 'S IN each group should be - ig 
lacilitates some Statistical = eae : erm wrth OT) en eebreferred because it 
: : At | rOwards 1968: \wr / il 
SSSIBNMENI of subjects IMo different prc 7p) Winer 1971). oS the mf tebee 
edures can be adopted. 


dUpS, any of the lollowing proc 

On a od of randomly assigning the subjects into the different groups is 

ed with an examp| os of the Table of Random Numbers in selecting the 

Of Random Numbere in students are to be random ly assigned to 

the Umbers is chosen and we po ; “My Convenient starting point in the Table 
SEQUENCE of 


number iFitis either | or 2. Suppose 
En only 221 js selected and the remaining 


a retreat is essential 
randomized-groups 
ot be any systematic 
particular group to which 
- Underwood (1966) has 


jects are individually known to 
a way, they are made captive for 


. 


Broups of subjects can be 


Research Design 503 


bi F ‘ i ‘rire rey | r af ( vd 
bers are ignored. Now, the first student in the alphabetical list is assigned to the secor 
. & P: a * 
umber 


ent to the first group 
the second student is also assigned to the second group and third student to the first group, 
Up, WIE 9 
wre 


ntil both the numbers have occurred 25 times, It may be that any one number he s 
rule cs 1 may complete 25 frequencies earlier than the other, In such a situation, that number 
either manatee i ignored so that each group may consist of 25 subjects. 

i oy h of the 50 students may be assigned a number from | to 50 on separate slips of equal 
_ "the same colour. Subsequently, the slips may be folded in a similar way and rE 
ge oper reshuffling, the numbers are drawn one by one by the experimenter. The first 25 
de +m may constitute the first group and the remaining 25 numbers drawn may 
Meath second group. 
aa i the ee are allowed to take a seat in the classroom. When all the 50 
< sa taken their seats according to their choice, the experimenter may dant from the left 
students / and go on counting until he has completed 25. This will constitute the first del ag 
orig iat i may be counted off in a similar way to constitute the second Braue: This 
Lina wreak raseipiiiier of subjects into groups is not a completely random wie unbiased aie 
wale ms ' may produce a systematic relationship or correlation between ‘the choice for sitting in 
ho cle (a subject variable) and the group to which it is assigned. | —_—a 
4, The simplest way of randomly assigning the subjects into okaa ein is abe a 
biects in the list in alphabetical order. Subsequently, the first student int ve list is assig : | 

st roup; the second student to the second group; the third student to the sl group, and so sin 

| he technique of sequential assignment is another method of Hess Sen oe 
to different groups. In this technique, the experimenter does not know the ~ we ie iN an 
He is simply aware of the fact that a certain number of subjects will participate os i og 
As the experiment in the randomized-group design usually continues for cael ay “ " 
the experimenter may run five subjects on the first day, two ey on “5 secon ot * 
subjects on the third day, and so on. Thus the experiment is conducted in a prearrang 


sequence. There are two main techniques of sequential assignment of subjects into the various 
groups, namely, complete randomization and block randomization, 


The method of complete randomization as applied to the technique of captive assignment 
can also be extended to the technique of sequential assignment. Let us suppose that a total pool 
of 50 subjects is available to the experimenter for conducting af ive-condition experiment, each 
condition having 10 subjects. Let us name the five conditions of the experiment as A,B,C,D and 
E. The simple way is to prepare 50 slips of the same size and colour and number them trom 1 to 
50 so that each slip has only one particular number, Of these 50 slips, 10 will have ‘A written or 
printed on them, 10 will have ‘B’ written or printed, 10 will have ‘C’, 10 will have ‘D and the 
remaining 10 will have ‘E’. Subsequently, these 50 slips will be kept in a box. After stirring, the 
experimenter draws the slips one by one and lists them in the order drawn. Then, the first subject 
coming to the experimenter is assigned to the condition written on the first slip drawn, the second 
subject is assigned to the condition written on the second slip drawn, and so on. Suppose the first 
slip drawn contains C and the second slip contains D. It means that the first subject coming to the 
experimenter will be assigned to condition C, and the second subject coming to the experimenter 
will be assigned to condition D. By doing so the experimenter is obviously trying to avoid the 
relationship or correlation (which may bias the assignment) between the particular condition, 
and the time when the subjects come to the laboratory. One general limitation of this technique is 
that sometimes, just by chance, there may occur some correlation between the assigned 
Condition and the times when subjects appear. Suppose the 50 drawn sequences beome like this: 
CCCCACCCDCCC... EBBAAB BBABBBB. It this type of sequence emerges, itis obvious that till the 
lormation of the C condition group (of 10 subjects), B condition does not appear. B entries 

ecome very frequent towards the end of the series. This means that there is some correlation, 
though by chance, between B condition and the time succession, When the number of subjects is 


30 OF more in each group, such a type of sequence is very unlikely. But when Nis 10 or less than 


i seongre Sees 
neaeear i Abvthowes 1M Seat 1M 

» i bik i oh i 
pyrene afer ® 


gy [ees Vavante 


i ep : Liaise C7 i fo 
hy a seqTHeHee FS high. The remedy for such a xd sequence " 
wahabality hap Suen es 
1. the prot a. 
the tachnngue of block randomization 


4 at black randomization may be defined as one dea e ener CONditign 

Hike: memnee i able accup ence Ih each successive black of trials, but the we 
value ol thre’ independent ‘f ‘ wa sl different from every other condition, To Meet Of 
' porch it is essential that all conditions Must OCCU AN ery. 


number of times the conditions should occur is determined by the ny 


condinens 44 hin 
rH OM Len 
number oF LIS. The 


blocks It, for example, there are six blocks, each condition will occur six Limes altogethe ; 
vt blocks wed 


nbers may be used, Asan illustration of block randomization of five Conditions of the q 
ts aa as 


xperiment, the Following sequence of schedule may be tollowed tt each condition jg lo be 
e s 


presented 10 times: 

CADBE: BADEC: CAEDB: BACED: ADCBE; EDBAC: ADECB; BEACD: DABCE: CDBAE 

in the above preparation for block randomization, there are ten blocks and each ¢ ondition 
occurs within each block once in a random order. In the above sequence, the order INGICaLES the 
pairing of a condition with successively appearing subjects, In other words, the first subject will 
be assigned to condition C: the second subject to condition A; the third subject to condition py. 


the fourth subject to condition B: the fifth subject to condition E; the sixth subject to Condition B 
a i + % ™ i 
and soon, and the 50th subject to condition E, 


After the experiment is over, the statistical analysis of the data is done, tp 5 
bworandomized-groups design, generally the ¢ test or the Mann-Whitney U test jg the 
appropriate statistic to follow. The details of calculation of these statistics in this lvpe 


of design 

appear in Chapter 23, 
The two-tandomized-groups design has two important limitations. First, in the COUrSE of 
running an experiment based upon — the ") 


two-randomized-groups 
more-than-two-randomized-groups design) usually the experimenter | 


problem posed by the differential loss of subjects after their random 
serious and relates to the fact whether or not subjects were true repre 
fandom or unbiased group. If the subjects are not representativ 
assigned groups, it is obvious that these equivalent g 
may No longer be really equivalent and then the con: lusion drawn may nat be generalized, Let 
us illustrate this point with an example, Suppose the experimenter wants to run a two-conditi 

experiment, In one condition 20 trials per day up to six days are to be given, and A a 
condition 200 trials per day up to sir days are to be given. A total of 40 subjec ts | 
assigned 0 two groups (N= 20 in eac h), one meant for the 20-trials-pe a 
another meant for the 200-tri 20-trials- 


inother mea als-perday cond 
subjects randomly assi aned to the 200 


bth day of the experiment, where 
up for all the six days. The 
APPropriate statistical tests des 
has to face the loss Of five s 


design — (or 
oses some subjects, The 
assignment to groups jg 
sentatives of the original 
e of the original randomly 
oups resulting from random assignment 


another 
| randomly 
perday condition and 
U five of the original 20 
. hottum up on the 4th, 5thand 
3 ects the 20-trials-per-day condition tum 
Pregeca ee Wo groups can be compared through 
va sm qu question arises as to why the experimenter 
inteniciec Were en ite a Latin group. Nas the task boring? Were the subjects not 
convenient to become illt Wan re oo e to the experimental task? Did they find it 
fepresentative of the original unbiased 6 a id sie ee Obvious that such subjects are not 
situation, these hee graupslarpaatt Bt p resulting trom the random assignment and in sucha 
longer equivalent ones wom random assignment and said to be equiv ai 

: : | : juivalent, remain no 

Second, behavioural r 
Many independent y ariab| 
determine what type of rel 
dependent Vatiable, A two 


ition, Further suppose tha 
‘Irials-per-day condition do 
as all subjects assigned to 


eSeg aa ya ") ¢ j 
" rec - have two general PUrPOSes: one is to det 
> Tens 10 intluence the ¢ a | ; 
ihn “en dependent Variable most and another is to 
soeldingeteee — the influential independent variable and the 
“MPS CeSign serves the tirst purpose but does hot serve 


ermine which of the 


. PimeENtAlon, it is difficult to state precisely the exact relationship between the influential 
ue 
es 


Reward Design $05 


econd one, Since IN such a design only two values of independent variables take part in 
a! 


wndent variable and the dependent variable, For stating such a relationship, tis essential 
inept than two values of the independent variable are studied. Such a design is known as 
Iam than-two-rancomized- groups design whose discussion in detail is presented below, 
ye 

yre-Than-Two-Randomized-Groups Design 

il of some imitations of the WWo-randomized-groups design, itis being used ina restricted 
| v8 and more emphasis is being Riven to the more-than-two-randomized-groups design. Such a 
alah is also known as the mulli-groups design, As its name implies, in such a design there are 
ihree or more conditions or values of the independent variable and accordingly, three or mor @ 
poups of St ibjects participate in the ex periment. The design is so called bec ause ihe total subjec is 
we randomly assigned to three of more unbiased groups. The process of captive assignment as 
all as that of sequential assignment (discussed in the previous section) may be utilized as the 
rechniques O1 random assignment of subjects into three or more groups. 


In psychology al and educational researches, the use of the more-than-two-randomized- 


ups design is More common than the two-randomized-groups because the former has three 
iatinet advantages over the latter, 


|, Suppose that an experimenter wants to know which of the four methods of teaching 
Russian (say, Method A, B, C, and D) is to be preterred. Suppose 80 students of the same age, 
intelligence and sex are available for this purpose. The researcher's first step would be to assign 
ihem randomly into tour groups, each group having 20 subjects. These groups are supposed to he 
equivalent groups after random assignment. Subsequently, one group will be taught by method 
A: the second group by method B; the third group by method C; and the fourth group by method 
Dp. All subjects will be administered a test of Russian and through the appropriate: statistical 
techniques, We Can answer the above question, This illustrates the more-than-twa-randomized- 
groups design where all the four groups are used simultaneously. It is also possible to answer the 
question by conducting six separate Wwo-groups experiments, For example, in one experiment 
subjects taught by method A would be compared with subjects taught by method B; in the second 
experiment, subjects taught by method A would be compared with subjects taught by method C; 
ina third experiment subjects taught by method A would be compared with subjects taught by 
method D: ina fourth experiment, subjects taught by method B would be compared with subjects 
taught by method C; ina fifth experiment, subjects taught by method B would be compared with 
subjects taught by method D; and finally, in the sixth experiment subjects taught by method C 
would be compared with subjects taught by method D. Needless to say, this type of experimental 
design is painstaking, time-consuming and strenuous. Not only this, the problem of controlling 
relevant variables will also be serious and manifold. Even the experimenter may behave 
differently in the first experiment and in the last experiment due to fatigue or boredom, Theretore, 
we can say that the multi-groups design or the more-than-two-randomized-groups design. is 
superior to the Wwo-randomized-groups clesign. 


2. In behavioural research itis, in general, regarded as a healthy practice to sample more 


values of the independent variable because this helps the investigator to evaluate better the 
influence of the independent variable upon the dependent variable. This purpose is served more 
Satistactorily by the multi-groups design than by the two-groups design. An example may 
illustrate this point clearly, Suppose the experimenter Is interested in testing the hypothesis that 
the higher the hunger drive, the more correct responses a rat would make in a maze. For 
Conducting an experiment of this type with multi-groups designs, he may take five groups of rats 
Such that Group | has zero hour of food deprivation; Group has three hours of deprivation: 
Group {Il has six hours of deprivation; Group IV has eight hours of deprivation; and Group V has 








Sciences 
“f ~harioural - 

js and ResearcP Methoas #1 Be 

-yrenienss 


506 gests, Med 


. conduct a two-groups desjo, . 
i ter wishes to CO ; BM, | 
vation. If the experimen thei ndependent value would tend to sh Ob he 
ren hours of depr is which of the five values 0! ‘ble experimenter WOU Id like to choose 8 
immediate Poe jee? it is expected that a ont rol group) for one group and the cone 
. ter’s CNOILE: ¢ like Pc ; : “Cy 
experimenté “vation (which would act like ose that the result of the experiment based ,, nd 
lama? bs of a six-hour duration. Now, ae significant mean difference in the cone. 
roup mi me re Wa ae : ie aa 
group B roups design was that the 3 roup, six-hours froup, and eight-hours Ero 
the multi-g s group three-hours 8 e it produced a sign; up, 
responses of the 7erO HOU etter than the first four groups SEE SET eee eo enlticang 
b it the ten- hours group was bette han the first four groups. The conclusion would be the 
aie : rrect responses) than | although O hours te nat 
higher mean (of the correct lead to more correct responses, although 9 hours to 8 hours a 
the greater hours of thn ae ' difference in the means of the correct responses, When the 
deprivation produces ne pid deprivation groups) are used to study this problem 
“a jesien (O-hours and 6-hours Gepr : a Se _ the 
two-froups design (0-1 id be that the variations in the hours of deprivation does not affeg 
mri. DENIM: a ae epaiies in the maze. This conclusion woul d then clearly be Wwron 
slaps - a ras A Woe conclusion is that only two values of an independent Variable 
ery r aorawl a 25 canceling cae sar ‘ re 
i sari in the multi-groups design, the possibility of drawing this type of Conclusion jg 
were Sa 2 
tically minimized. + eos 
aa i ca ccion of the two-randomized-groups design, such a design fails 
3. As stated in the discussion © idee ‘al adenandont ‘atsl to 
establish an adequate relationship between the influentia Aha mar ieee serine © and the 
dependent variable, probably because this type of relationship eo be esta lem only after 
<everal values of the independent variables are sampled and average : hs multi-groups design, 
it is easier to establish adequate relationship between the indepen ent variable and the 
dependent variable because the design utilizes several conditions or gales ot the independent 
variable. This point can be iNustrated with the help of Figures 2) .2 and 21.3. In Figure 21.2, the 
xaxis shows the increasing values of the independent variable and the y-axis shows the 
increasing values of the dependent variable. Group | was given two hours or practice on 
memorizing a list of nonsense syllables, and Group II was given eight hours of practice for the 
same task. It is obvious that the value of the dependent variable (or proportion of retention of the 
nonsense syllables) for Group | is less than the proportion or value of Group II. As this design tells 
nothing about what happens in between these two limits (two hours of practice and eight hours of 
practice), we are unable to conclude that any obvious relationship exists between the 


7 , 


ion 


(Dependent variable) 
Proportion of retentio 


(Dependent variable) 
Proportion of retention 





x 
0 2 1 era :: | 
r f. 1 : Il x 
(2 hrs of practice) (8 hrs of practice) 7 rng 7 Hs HA | a st pase 
Hours of practice ney 3) (Ons rs | 
(Independent variable) Hours of practice 


(Independent vanable) 

Fig. 21.2) Unknown relationship 
between dependent 

variable and indepen- 


Fig. 21.3 A linear relationship between in- 
dependent variable and depe- 


dent vatiable obtained dent variable obtained from an 
from oan experiment expenment based: upon # 
based upon a two- two-groups design 


#roups desj on 


Research Design 507 


endent variable (hours of praece and the dependent variable (retention of the task). 
indeP 1 we add two more groups so that the four groups each having equal hours of practice 
suppose > hours (Group I), 4 hours (Group Il), 6 hours (Group Ill) and 8 hours (Group IV). The 
ng data from thts i te been plotted in Figure 21.3. It is obvious that each 
pesive BFOUP has obtain a higher value on the dependent variable, that is, Group I obtains 
: caher alue than Group I; Group Ill obtains a higher value than Group I; and Group IV obtains 
ahign hest value On the dependent variable. From this type of data, it appears that the four values 
the ve independent variable 2a related in a certain way with four values of the dependent 
of Diet: Obviously, the relationship is such that it can be easily explained with a straight line 
“ ence, we can conclude that a linear relationship exists. Of course, there are other types of 
ps, which can also be demonstrated. A frequently reported relationship is also a 
or curvilinear relationship between the dependent variable and the independent 


varl 
and h ; 
lations! 
nonlinear 
variable. 
To summarize, the multi-groups design enables the experimenter to establish an adequate 
relationsh'!p between the dependent variable and the independent variable but the two-groups 


design lacks this trait. 
In a multi-groups design, the two most common statistics applied are the analysis of variance 
Duncan Range test. For a detailed discussion of these statistics, particularly Duncan 


and the : 
st, readers are referred to Downie & Heath (1970) and Siegel (1956). 


Range fe 


Matched-Groups Design 

Like the randomized-groups design, the matched-groups design (also known as the 
-andomized-block design) may be a two-matched-groups design or a more-than-two- 
matched-groups design. Whatever the type, in the matched-groups design all subjects are first 
tested on a common task or a pretest measure (also called the matching variable) and then, they 
are formed into groups (as many as needed for the experiment) on the basis of the performance of 
the common task or the matching variable. The groups thus formed are said to be equivalent 
proups. Subsequently, the different conditions or values of the independent variable are 
introduced to each group. If these groups have equivalent means on the dependent variable 
before the experimental treatment is given and if a significant difference occurs after 
administering the experimental treatment and controlling the relevant variables, the resulting 
differences in the dependent variable may safely be attributed to the experimental treatment. 
Thus the matched-groups design is simply a way of establishing the fact that all the groups have 
equal dependent variable values prior to the administration of the experimental treatment. 

The matched-groups design is based upon the principle that the experimental unit (or 
subject) can form a block or group. As a matter of fact, a group of subjects said to be 
homogeneous with respect to the matching variable forms a block. It is expected that each block 
of subjects will be equivalent or homogeneous on the dependent variable in the absence of the 
experimental treatment than subjects selected at random. In fact, each block of subjects is 
matched with respect to a matching variable. This is why the matching group design is also 
known as the randomized block design (Edwards 1968, 156), a term which has been borrowed 
from the field of agriculture in which the experimental unit is a plot of land and a block is defined 
as the strip of several adjacent plots. It is expected that plots which are adjacent to each other will 
be more homogeneous in fertility and other soil characteristics than equal numbers of plots 
selected at random from different fields. In psychological and educational researches. the 
experimental unit corresponding to a plot in agricultural research is the subject. 

_ Now, two related problems arise in connection with the matching task: one is Concerned 
with the selection of the matching variable and another with the ways or methods ot matching 


: Sar ha wna sciences 
j pa OH eeyh i ri - 

] may fs. Afecstert rier 
508 tc cS, . 


selection of the Matching a roups design subjects are measured on the basis of 
As discussed above, iN ai acon of the experimental variable. The question i of the 
matching variable prior tot ante 2 The:aost important characteristic on the basis of whee 
one to select a matching csteereat is its ability to yield a high correlation with the experi 
matching variable aT cdibie if ae matching variable is found to be highly correlated Wi “ntl 
task or the a corel task or with scores on the dependent variable, the matching may 1. 
scores On Te eal Ont other hand, if the matching variable yields a poor correlation ome 
regarded as ei 4 spendent variable scores, the matching is not regarded as successfy|. N = 
correlation stan ae can the experimenter find a matching variable which highly correlate 
oo, ee variable? One obvious answer Is to use the dependent variable itself ac 
aaah vedable: For example, suppose the experimenter is pnd ih oe effect of knowled | 
results upon the maze learning. There may be two groups er su JECENTOF I ONS Group Working 
with knowledge of results in the trials given to It and another group working without knowled 5 


five .s Re 
of results in the trials given to it. For assigning the subjects into two groups by the Matching 


technique, the experimenter may give them six trials on the maze learning itself and obtain “a 
of scores, Subsequently, based upon the obtained scores the subjects may be paired off, | 
The method used for dividing each pair into two groups is randomization. The random 
assignment of subjects is essential to prevent the operation of the experimenter’s bias in 
matching. Then, these two groups may be given trials according to the experimental plan. The t 


test may be calculated between two sets of scores obtained by these two matched-groups asa 
measure of whether or not the knowledge of results has produced a significant mean difference. 


One biggest advantage of using the initial trial periormance score on the dependent variable is 
that the correlation between the matching variable and the dependent variable is undisputed, 
and hence, the success of matching is almost certain. But the experimenter may have to facea 
situation where initial trials on the dependent variable may not be used as the matching variable, 
in such a situation he will have to depend upon some other measure, which should be different 
from the dependent variable but highly correlated with it. Suppose that the experimenter is 
studying the rapidity (and not the effect of knowledge of results) involved in the process of tracing 
a stylus maze by a human subject. In such a situation if the experimenter decides to match the 


subjects on the basis of scores of six trials given in maze tracing, this is likely to give some extra 
benefit to the subjects in tracing of the maze when it is intr 
dependent variable because such a practice may acccelerate the rapidity in tracing the maze), 
Likewise, it the experimenter is interested in studying the effect of practice in numerical ability, 
he cannot match subjects on the basis of the initial scores obtained from the solution of the 


a ania 3 ape Semeaie 
scab problems by the subjects in five or six trials because in that case each subject would 
nowing the answer of the numeric 


al problems when they (the ari ite . id 
be used as the 3 ne y (the arithmetical problems) would 
Scie task. However, in such a situation there is one solution. The 
matching eatin, a 5 a eee: problems (than those used as dependent variable) as a 
ey Ba Locdien mn ba esi numerical ability has a large number of problems, half of them 

Variables q: - ar 
dependent variable Ss and the remaining halt may be used as the measure of the 


In either solutions sug | | : 

yield a high correlation with ¢ S$ SUBpested above, the matching variable is expected to 
: ith the depend 

In matching, PENeent Varlable:. 


an independent , 
veen the independent tae 3 aa be used as the matching variable but the relationship 
PEeVioUS researches or investi se | ; peas variable should have been established in 
out Not be used in asttaatien oo ' hes ee be sald that = matched-groups Je 
i nN where the ratch} 7 ee . 
Bh correlation with the dependent wie hing variable does not yield or expect to yielda 


oduced as an experimental task (or the 


Research Desizn $99 


Methods of Matching 


Having selected a matching variable ariel obtained a set of scores earned by all subjects on the 
matching variable, the next step is to match them. There are two ways of matching: matching by 
airsand matching on the basis of mean and standard deviation. 
a Matching by Pairs: One very convenient technique is matching by pairs. On the basis of the 
obtained scores by the subjects, the experimenter matches subjects in a way that each subject has 
4 corresponding partner in the matched group or groups. An exarnple may be given to illustrate 
matching by pairs. Suppose the experimenter wants to make a comparative study of the lecture 
method and the demonstration method upon problem-solving behaviour, There are 14 subjects 
available for the purpose. \f the experimenter is to use the randomized-groups design, he may 
randomly assign 14 subjects into two groups, irrespective of what he knows about the subjects. 
But in using the matched-groups design, he must, first, match them on the matching variable. Let 
us SUPPOSE that intelligence test scores are used as the matching variable. All subjects are 
administered the intelligence test and their scores are obtained, which have been shown in 
Table 21.1. Now, the experimenter wants to construct two such groups (one for the lecture 
method and another for the demonstration method), which are equal on the intelligence test 
scores. For this, the experimenter chooses subjects who have equal scores. Thus subject no. | is 


paired with subject no. 9; subject no. 2 is paired with 12, and so on, and in this way, seven 
pairs are formed. 


Table 21.1 Scores of 14 subjects on intelligence test (data are fictitious) 
Subjects Scores. Subjects 





Scores 


1 70 5 90 
2 75 9 70 
3 69 10 oT 
4 90 in 65 
5 69 12 75 
b G1 13 62 
é 85 a 14 62 





Table 21.2 Matched-groups formed on the basis of the scores of the matching variable 


Lecture method group Demonstration method Broup _ 
| Subject No. Intelli Subject No, 





Intelligence score 


| 70 9 70 
2 75 12 ip 
5 69 3 69 
8 90 4 90 
6 1 10 61 
V1 85 7 


B5 





Se pee oe ke lie 
Veasuremenls and Researe 
? each, F Ci | 
510 


Each pall, thus 
randomly forme 
goes to the § 
taught by the iene 
subjects 'S complete y ? 

the element of bias may a 
place all those subjects who 


d. By a simple toss of a col 


in Chapter 23. 


Table 21.3 Scores on a problem-solving task by pairs of subjects 


PL rere Feet 


on the basis of intelligence test scores 


—_Lecturemethod group 
Subject No. Intelligence score 

8 120 

1] 109 

6 95 

y 80 

91 

9 75 

14 70) 


Computationally simpler than the { lest is the A- 


wo-matched-groups design as a substitute for the 
in statistical 


calculation the subjects have 
variable, the pair securing th 


and the pair securing the low. 
There is, however, o 
INSISIS ON precise matchin 


€ eliminated. Say, for 


example, that subj | 
score, that p subject no, 1 


7 have 
ers for them, in 


this Situation 


| 7 the Bfoup. A situation 
'S dealing with a larger BfOUp. This has a 
is, howe 


Subject No, 
4 
7 


10 


13 


e of 


and 


deviant 


tutes a block. From these blocks, or pairmay : 

formed, constitute: n, the experimenter may deterr; 
ht by the lecture method and subject no, 9 20 

roup to be _ method. Sometimes, it has been reported | 
demonstralio ill of the experimenter. This is a dange 


ne that 5 e"OUp, 


r , 
hat the Bun 
Ke i 

introduced in the assignment. For example, the cep My 


highly motivated and interested in the 
thod. But when random assignment is done, tte ex 
ailadecermenn \| d The two-matched groups formed on the basis of in 
automatically ie 21,2. Subsequently, both groups are given 
have been shown sd by them is written as shown in Table 21.3. A t tec 
an ie od err a the obtained mean difference between the 
poole Me ed of calculation of tin the two-matched groups design Will be 
significant. : | 


Experi 


Matched 


Demonstration method Stoun la 


Intelligence seam 


123 
0 
99 
67 
80 
f2 


66 


test which may also be used in the 

(test (Sandler 1955). Note that for facilitation 
been ranked in terms of score 
€ highest scores on the problem-solvi 
est score on the problem- 
ne limitation of the techniqu 
§, the persons with deviant scores 


subjects wouldbe 
scores in the sense that there arene 
like this is not uncommon when the 
natural effect of shortening the original | 
ver, not serious unless the experimenter ; 
regarding the population from which 
blem, the experimenter may predetermine | 


5 IQ the. Jeep me 


56 5 
Iwo = * Phi 
SETS ra lh 






a 
Ment. 
BTOUD to be yer 


Per mente by lhe 

lellipen, lag i 
“et 

the problems oe 


Son the dependent 
ng task being kept at the top 
solving task being kept at the bottom, 

matching by pair. If the experimenter 
on the matching variable are likelyt0 
9 are paired because each has an equal 
AHeSetunadica sat any one of this pair has 65 instead of 70. Now 
€ ofiginal ore sc Ne paired. Not only this, both the 


mine an allowance for the difference af 3 


Ores | 


| 


Research Des 14 n 511 
-gres and then, subject nos. | and 4 Ma 
| 2 


on the matching variable between them. 


Matching in Terms of Mean and SD: Another way of matching the subjects is in terms of 
central tendencies and variabilities in the distribution of scores obtained on the matching 
variable. Mean and SD are common 
variability of the distribution respectivel 
standard deviation of the scores obtained on the matching variable, the experimenter forms as 
many groups as needed for the experiment ina way that these means and standard deviations do 
not statistically ditter. There are three methods of matching in terms of means and standard 
deviations, namely, the random-blocks methods, the method of counterbalancing order and the 
block-repetition method. 


In the random-blocks method the blocks are, first, formed, The number of subjects in each 
block is kept constant and is determined by the number of values of the independent variable. 
Subsequently, subjects from each block are randomly assigned to the different groups as required 
by the different conditions or values of the independent variable. Suppose the experimenter 
wants to conduct a three-conditions experiment. Naturally, then, the number of values of the 
independent variable would be three. So, there would be three groups of subjects, one for each 
condition. Let us name those Broups as A, B and C, and assume that there are 24 subjects 
available for the purpose. Since there are three values or conditions of the independent variable, 
there would be three subjects per block. Thus the maximum of eight blocks will be prepared for 
24 subjects. The scores of 24 subjects ona matching variable (say, intelligence test) are presented 
in Table 21.4. They have been shown arranged in decreasing order to yield facilitation in using the 


Table 21.4 Scores for 24 subjects obtained on the matching variable and 
presented in decreasing order (data are fictitious) 


y be paired together despite the difference of the 5 scores 









Subject No. Score Subject No, Score Subject No. Score 
] 161 9 147 He 131 
2 160 10 144 18 13] 
3 156 11 140 19 125 
4 155 12 140 20 124 
9 154 13 138 21 123 
6 152 14 136 22 120 
/ 148 5 135 23 118 
6 147 16 132 24 7 
er 


above three techniques of matching. In using the technique of the random-blocks method, the 
experimenter may start from the top or bottom of the distribution. Let 
first three subjects would form one bl 


lo groups A, B and C, with one su 


us start from the top. The 
ock. Then these three subjects would be randomly assigned 


bject to one group. Suppose the sequence of the random 
assignments is 2-1-3, The remaining subjects would be treated in the same fashion, that is. 


sequentially blocking them into groups of three subjects and subsequently, assigning them 
randomly to each group in a manner that one subject is assigned to one group. The random order 
in the first block is 2-1-3 (1, 2, 3 stand for conditions A, B, C respectively), that is, the first subject 


y aN 


vil SCHCHMCUS 
saber nord Sen 
aevareh Meth wiv in Beba 
vie anid Ae! ! 
ay rpreneinies aa? 
Measure’ 


$12 18 


cactincendition A and the third subje, 
B: the sec ond subject in conde ; sa biel Joc 
asplacee'n Te would be determined for he ne } 
ae dom order WeN 
L 


and subjects Woy Me 
The resulting data has been shown in Table 4 Ud” 
ering 28 {to different groups. The resulting data has be Nin Table 21, 
| iy. assigned [0 ‘ 

accordingly, ass 


-_santerbatancing the ordet of matching, nace) ABCCBA...¢, a 

In the method ot ( i ( ap tit subjects into three groups, Thus the first is . 7" 
yarsoey nse eaiion A: the second to group B; the third to BTOUP C: the - 
ae Oe, ci Bi the sixth to group A; and the seventh to group A, and . 
group C; the HAI Tog 


i") CON ig 



















tg 
“amed that subjects thus assigned Into the different groups would be matched Broups, ! ‘ 
srculting data has been shown in Table 21.5. 
ens Men Band atin to ean Dh 
i B : ae , 
Random-blocks method Method of Senn lock repetition m had 
A Rs 
160) 15g 
155 159 
147 ia 
144 id 
136 135 
- = 13] 
124 | 125 125 G4 
117 120 ms | 7 | 18 18 | i 
pep 139.250 
139.375 : 


140.375]138.750 137.875 
SDs 13.93 13.60 | 13.55 13.17 | 13.41 | 13.55 | 13.25 
In the block-re | 2 


| ( y the author because a block 
Conditions is first formed and then successively repeated), a block of thre 
number of conditions) is made in the natural sequence 
Conditions is repeated in each block. F 
sequence and the same ord 


petition method (so named b 


of the required 
€ conditions (or required 
and the same order of the sequence of 
or example, ABC is one block, which appears in a natural 
We dee ae os — would be maintained in each of the eight blocks. Thus 
sae mee: oe 7 2 won : be repeated In each block. The sequence indicates that the 
conibicnanee a = ae condition A; the second to condition or group B: the third to 
bea _ om to condition A, and so on, The resulting data has been shown in 
iia bamncae, a deviations lor each column in Table 21.5, have been given 

ekiapctlin: ea sik ie : Is obvious that there occur greater mean ditlerences in the 
therefore. he eee vitielarec lo vi naan differences in the random-blocks method. It can, 

© block-repetition meth 


method and the random-blocks wl od should be the least preferred 
One questi os Method should be the Most preterred method for matching. 
#4 ' Fi 

imenter follow - sii he faised here, Which method of ma 
he match by saieal sh d he Match on the basis of means and a 

ere iS no fix d : : 
experimenter’ f@ answer to this queer, 
nter's PUTPOSe—whether he wants stalistic ean 


itching should the 
andard deviations or should 


It all depends upon the 
aching or he can tolerate 


al precision in m 





Research Design $135 
differences in ability levels of Subjects 
Q(t Te 


In the former case, he should adhere to matching by 
irs and in the latter situation he should resort te matching on the basis of means and standard 
ylts © 
deviations: 


which Design to Prefer: Randomized-Groups or Matched-Groups? 
the experimenter has decided lo use the independe 
yethe ndomized-groups design or the matched-pro 
Sivan answers to this question which in jtse 
vou of choosing one over the other, A , 


nt Proups in his experiment, should he use 
ups design? Different experts would give 
If is an evidence that there is no strong point in 


eview of the literature clearly shows that the 
randomized-groups design has been used more frequently than the matched-groups design. 


There are two probable reasons for this. First, in the matched-proups design the inconvenience of 


he experimenter is increased because of two factors. One reason is that it is difficult to obtain a 
Sigeast matching variable and obtain a set 
r ; 


3 of scores upon it. Another difficulty arises when the 
xperimenter is using the initial trial performance score as the matching variable. If the problem is 
. be investigated within a laboratory, 


the experimenter will have to bring the subjects twice to 
the laboratory, the first time for obtaining initial scores and the second time for obtaining scores 


after introducing the independent variable. This requirement of keeping the subjects present 
twice in the laboratory is troublesome, For the 


reasons stated above, the experimenter should 
prefer the randomized-groups design to the randomized-block design or the matched-groups 
design. Second, many statistically advanced techniques can more easily be applied with the 
randomized-groups design than with the matched-groups design. So, the former has been 
preferred to the latter. 

Apart from this, in the matched-groups 
freedom (df) than in the randomized-groups 
both the designs. For example, 
two-matched-groups design there 


design the experimenter has fewer degrees of 
design if the total number of subjects (N) is equal in 
Suppose the experimenter has 20 subjects. In the 
will be 10 pairs of subjects and accordingly, the degree of 
freedom will be equal to 10-1=9, But in the randomized-groups design, there would be 


20 —2 = 18 degrees of freedom. If the experimenter is to use the test as a measure of significance 
of the mean difference after the experimental treatment, it ma 


y be that the same value of t may be 
significant with the randomized-groups design but not significant with the matched-groups 
design probably because of the fact that the greater the number of degrees of freedom, the smaller 
the value of the t required for being signific 


ant. This fact may be illustrated with 
suppose the experimenter gets 2.20 as the value 


of the { test in an experiment having two groups, 
each consisting of 10 subjects. If the value of the ctest is interpreted on the basis of df calculated 
in the randomized-groups, that is, 20-2 =1 8, we tind that this may be taken as Signilicant at 
0.05 level. The required value of t at the said level for the said df is 2.10. But the same value of t 
(that is, 2.20) is not significant if it is interpreted on the basis of df calculated in the 
matched-groups design (that is, 10 - 2 =8) at 0.05 level because the required value of t at this df is 
2.31. For this reason, the matched-groups design suffers a disadvantage as compared to the 
randomized-groups design. 


an example. 


If there is a positive and high correlation between the matching variable and the dependent 
variable, our matching would be most pertect and successful. In such a case, the groups will be 
highly homogeneous or equivalent and then, the probability is that a sharp distinction may occur 
between the BFoups after the experimental treatment is given. In such a situation, the value of tis 
likely to be high so that it can refute the null hypothesis at a higher level of confidence. In a 
nutshell, it can be said that it there is a higher and positive correlation between the matching and 
the dependent variable, the value of t may be increased. And if the correlation is high, the 
“xperimenter can afford to lose some degrees of freedom by matching and then, he should prefer 
the matched-groups design to the randomized-groups design. If in reality this is not so, the 
experimenter should prefer the randomized-groups design to the matched-groups design. It may 


proural SCiI@nCES 


Methods Ww Beba Research Design 515 





ree 
rests, Measurements and Resea lity, which goes against the matc}, - biects 
“sd ideration is a rea ity, Id not lose muc ‘ Od-ge bject to different treatment combinations. Let us suppose there are 40 sub) 
s of di consi ee. the experimenter would not lose muc bY mates sign the subj ) | signed to the four cells of 
d that the los large, the expe Ich assig | So, these 40 subjects will be randomly assigne 
3iso be said tha Nis small. If the Nis larg relation between the matching and the deper, Ng available for the purpose. So, hima bovdoncwiii sie hou FL. CLL ce «dec, eueens: 
design only wnen the case of a small cor nce between the values of f required at diff ) Table 21.6. The SS | - in the table of : ze bers and go on assigning 
subjects (eve? there is a Very smal ig , larger N (and therefore, with higher dn at We cam, then, stat 8 Shy'ro a a ea ts = tdi Hh te wtable The ‘ana would be 
variable) because ce with greater df. wi tetieal results (like the ft test), simifa, te he subjects to the groups according to the sequence of digits in the table. P sd tery Bites. 
levels of significance will yield significant staltst " those ye ated until each number of the four numbers (111,1I and IV) has been repea 
matched-groups oon roups design. repe* se the sequence ot digits in the table of random numbers is 26438. It means that subject nis 
found by the randomizea-6 . uP iT he yssigned to group Il; subject no, 2 will be assigned to group IV; and subject no. 3 edb sa 
_ nai bw oa | igi roups. When a 
Factorial Design randomized-groups design and the randomized-blocke es assigned to group lll. The digits 6 and 8 will be dropped as we have only four group 


discibed are appropriate tor studying a single independent Variable 
which 


Seat roe ree hus there are 
independent variable is varied in two ways (and thus tl oR as = values Of the 
time. Ifthe ps a the two-randomized groups or the two-ma ENCG-BFOUPS design ; 
independent variab i sn tlia single independent variable is varied in more than two Ways 
recommended and w si two values of the independent variable), the More-tha 
| re aré more sinha: 
thus there s or the more-than-two-matche ; 
randomized-groups © recommended, It is commonly observed that in many behavioss 
multi-groups design) | jess ith a problem in which he is required to manipulate two 
hes, the investigator is faced with a p Vv zee | or 
caote two independent variables simultaneously, The experimental design suited in 
more than two ingepe aa eae rea actorial design, then, may be detir | 
situation is technically known as the factorial design, e eee : sieeateaih Saree, niin 
2 sea tae : ‘naent Variables are manipulated ; 
design in which the selected values of two or more indepe : Pulated in alj 


the 40 subjects have been randomly assigned to four subgroups, the four treatment combinations 
the iso be randomly assigned so that each group receives one treatment combination. 
Sean ‘ization of subjects into different experimental conditions sometimes becomes a difficult 
are is particularly tue when the organismic variables like age, sex, achievement, 
ae" ence, anxiety, etc., are used as the independent variables. Ideally, for complete 
d-groups design (together called me vexdoribanen of subjects in factorial design, independent variables should belong to Avia 
: category of active variables. In the above-described experiment, the independent Vara = 
belong to the category of active variables. In a2 «2 factorial design, sometimes it may be found 
that Ais an active variable and B is the organismic variable or attribute variable. Let us say that B 
stands for the sex variable and A stands for noise in any 2 «2 factorial experiment. The two levels 
of B are males (B,) and fernales (B,), and the two level« of A are high-noise condition (A,) and the 
: inations so that their independent as well as interactive effects upon the q low-noise condition uD Since B 5 the organismic variable, we cannot assign subjects to B, and 
possible combinations egies | va factorial design is that the different « | : B, at random. However, all male subjects may be randomly assigned to A,B, and A,B, 
variable may be studied. Une meme 2 ee ti See Pe the inde enchant Py ¢ experimental conditions. Likewise, all female subjects may randomly be assigned to A,B, and 
subjects work under pa! possinne “ha petite ce a e wear Pe rR es » Yartables) of A,B, experimental conditions. if A. like B. also happens to be an organismic variable, it is risky to 
the design. In briet, a factorial design has the following three me ara ISLICS: sn at cesses deskini: : 


Table 21.6 A 2 «2 factorial design for two independent variables 


So far we have disc 
matched-groups design, 


1. Two or more independent variables are manipulated in all possible « ombinations. 
2, Fora design to be called factorial, different subgroups or subjects must serve under 


possible combination of the independent variables. As far as possible an equal number of Noise (A) | 
subjects in all subgroups is preferred, although this is not an essentia| condition fora factorial a enn a 


RA Sa ao Mas aat Ciao ee = High Low 
design. An equal number of subjects is preterred because it facilitates statistical COMputations, = 
3, The factorial design enables the experimenter to study the independent as well as the = = 
interactive effect of the two or more independent variables. a 
Let us take an example to illustrate the meaning of factorial design. Suppose the = 





experimenter wants to know: Do noise and illumination affect the rate o learning of a list of 1§ 
consonant syllables? Obviously, there are two independent variables: one js noise and the other Types of Factorial Design 


au us denote the rst Independent variable as A and the <ex ond independent Atactorial design is directly Classified on the basis of the number 
fas &. The dependent variable is the ralé of learning. Further suppose that noise is However, we shall limit aie Siccihags | 


of independent variables used. 
manipulated in two ways: high-noise condition and low-noise co 


ON to factorial design up to three variables. For a discussion 


levels of A. Let Hi eas: ee : ndition. Thus there are two Of factorial design USINE more than three variables r f ; (1067) 
ae llama lad ne cemtion be called A, and the low-noise condition A. Likewise, Winer (1971), Ostle & Mensing (1975) , the reader is referred to Edwards (1968), 
help ilt ipulaled in two ways: the high-illuminti- V c — 
illumination | 3 > Wal NEn-iMumintion level and the low. ; 
and the toi te aap Zz ee ot Balso, Let the high-illumination level be called B, Factorial Design with Two Independent Variables 

: evel De called B. Thus we see thar oe a ee | if | ) 
ave been manipulated in Iwo way 2 > We see that both the independent variables The type of factorial design that we have discussed above is one in which there are two 
Maximum possible 


s. The resulting factor; fet be rar 
combinations ting factorial design is 2x2. There will be four independent Variables, each having 


: two levels. Hence, it ferred a9 x : 
iia Of the experi, : ob gill : | » K Was relerred to as a? x? tactorial 
Manipulated varjab| Experimental conditions Senerated by these two design, Likewise, a factorial design wi : 


as Thiese: casey ental th two independent Variable be of 2 
level conditi ONditions are A,B. (or hj h-noise ac sce a 4A « . nae epena | > May be Of a3 x2,3 x3, 4% 3 
ms Wollaston ASB, lor low-noise as wellac high-illumination : aia high-illumination ee oie this way a generalized factorial design for two independent Variables may be 
“Mumination level condition | Sver Condition), A,B, (high-noise : a% factorial design in which K tands f | y 
condition) thes Condition) and 4.8 7 en al png AG for th | | stands tor the first independent Variable and L stands 
© four possib| 202 VOW-NOise as Wel] a< low-illumination level oF the second independent variable. The va’ 7 indi 
combinations € combinations have b | oe we Ic ae: variable. The value of K and | indicates the number of wave ; hic! 
of the een shown in T 1 - WAA... the first anc ee Way's In whach 
Independent Variables have fy n lable 21.6. When the possible and the second independent variables have been manipulated. These ways are known ae 


fen prepared, the next step is to randomly levels Of the independent variabi 


e. Thus, a 3x2 factorial desion india. | 
INdependent var; , ' “ €sign indicates that the fir 
. ndent Variable has three levels and the Sear nnr isese..= Jd... 7 _ 4: L Tis | 
“! 








a | tigat wound SCTCHCES 
7d Resean b Methods 1m Behe 
fan " 
pepe Aes nernee . — | 
516 Tes ditions. A3 x3 design, similarly, INdiCates thar 
such a design would have nine experi each, 
wy 


nental con 


jy expert 
jaan ee has three levels. 


dent variables 
. way q 7 Te Ac | ft 
rreatment combinatior® | design with two independent variables Pe ah mie design, Hen 
+ factorial Ges! . s experiment cited earlier, Thj ite. 
The simplest facte detail by referring to the experiment . This experip We 
hall discuss this design Ine dent variable. The two independent ya; hag 
Sy te en 


, ne depen iableg a 
dent variables and 0 pec se ae aac’ ' 

‘ Hiumunation, 

noise and # 


e three basic questions before him: 
experimenter would have , = 
: --@ affect the rate of learning® re 
1 esi (illumination affect the rate of learning? 
els OF NTU : ‘ ; 
2. pe a between the noise and the levels of illumination? 
there an\ ~ . 
3. i experimenter been ‘aterested in only the first two questions, a factorija| design , 
— — not have been necessary. These questions can be answered directly P 
2x2 pene -andomized-groups design. Since the experimenter is also interested jp ae " 


i 2 - . = rt ® OW; 
eee sh the interaction of the noise and the illumination with respect to the dependen 
simu an) 


variable scores, a factorial design becomes a Reeaees ity: ; = 
Before we go to the details of a 2x2 factorial design, let us consider the conc ie 
interaction from a more scientific point of view. First, let us illustrate the concept of Interaction 
with an example. Suppose the problem is: Which of the two is more convenient for a pe : 
riding a larger-wheel bicycle or riding 2 smaller-wheel bicycle? Obviously, the immediate angwey 
will be one of the two: (a) a larger-wheel bicycle, or (b) a smaller-wheel bicycle, However, jn 
addition to these answers, there is the possibility for a third answer, that is, ‘it depends upon the 
body build.’ In other words, a tall person may find riding a larger-wheel bicycle more CONnVenien 
than a smaller-wheel bicycle but a short person may find riding a smaller-wheel bicycle more 
convenient than a larger-wheel bicycle, It is obvious, thus, that the third answer depends upon 
the body build of the person who is riding the bicycle. This is the basic notion of the interaction, 
We can say the act of riding a smaller-wheel bicycle or a larger-wheel bicycle interacts with the 


: Here the 


body build of the rider. Thus an interaction may be said to exist between the two independent 


variables when changes produced in the dependent-variable score by one independent variable 
are also determined by the other independent variable. In the experiment planned above, if 
interaction exists between the noise (one independent variable) and the level of illumination 
another independent variable), it means that the rate of learning under the different conditions of 
noise would depend upon the levels of illumination found in those conditions or the rate of 
learning under different levels of illumination would depend upon the different conditions of 
se ; ee — — noite and illumination, it means changes in the rate of 

S of noise are not dependent upon the two levels of illumination 


or the cha int aad 
nges in the rate of learning under two levels of illumination are not dependent upon the 


Iwo Conditions of noise. 


Statistical Analysis in a 2 2 Fa ctorial Design 
The Statistic that is frequently applied to an 
which was originally developed by Sir R 


data (fictitious) 
shown in Table 21 - 
shows the number of trig} 1.7 from the lear 


te § taken by each sub 
Criterion of one perfect i Se The 
wena The data in Tah 
es | 
BfOups may be Put in thei 


y factorial design is the Analysis of Variance (ANOVA), 


e lee ning experiment described above. The table 
ject in learning a list of 15 consonant syllables with 
€ method of serial learning was followed by the 


le 21.7 has been re : 
" rearranged in Table 21.8 s . s of the 
r appropriate cel|. so that the mean 


A Fisher. Let us assume that the experimenter got the 












illumination Hlurmination illurnination low illumination 
15 10 16 " 
14 12 18 9 
20 10 22 8 
22 9 25 
16 8 26 6 
18 12 20 10 
20 20 11 
71 10 18 12 
18 10 17 10 
17 | 10 16 10 
EX = 181 102 198 93 
IX? = 3339 1054 4034 895 
Mean = 18.1 10.2 19.8 9.3 
Table 21.8 Means for the four experimental conditions 
Noise (A) Zi 
High (A,) Low (A,) 
7 Means 
= (A,B,) (A,B, ) 
= EX =181 EX =102 
os Mean = 18.1 Mean = 10.2 14.15 
n =10 n, =10 
~ (AB,) (A,B) 
sS 
3 LA =196 LX =93 
Es Mean = 19.8 Mean = 9.3 14.55 
ny, =10 ny =10 
Means 18.95 9.75 


Group lim =10) 
High noise and high 


Rewewarch | eaigri 






Group Win, = 10) Group il (n, = 10) Croup PV 





Low none and high High noise and low 
































EX = Total score on the dependent variable 


tious data obtained in an experiment hased upon a? ~ 2 factorial design 


S17 


in, = 10) 


Low nome and 


518 Tes. 


Mak Td ie a 


1.9 [Ilustration of the details of the calcua 
Table 21" ‘(data from Table 21.7) 










v ef 7 8 + 93) 
gx Wale 10ee IIB t2Ss 

_ _ (EX +2Xz + Mgt 4 = 40 

Step I: Correction N 


2 
_ (574)" _ 97369 
40 


vi 2) _ 
Step 2: Total $5 = (EX? + EX; + IXP+EX,)-C 
_ (3339 + 1054 + 4034 + 895) - 82369 


= 9322 —8236.9 = 1085.1 
DoF -y 
(Ex,)?  X,F  EXsN | AX 
Step 3: Among 55 = * 4 —— + “i — 


Z 


2 2 2 
gi , (102% | (198F 93)" _ go369 


—— a 





a a 10. +«+10 
2761+ eee 82369 


= 91018 — 82369 = 8649 





Step 4: Within 55 = Total SS - Among SS = 1085.1 - 864.9 =220.2 


Step 5: 5S between two levels of A (or the first independent variable) 


(EX, KG) , EX, + 2X, an 
ht hy My + My 


a 


_ 
E— 


(181+ 198)" | (102 + 93) 
10+ 10 10+ 10 


379)* 2 
im ) , (195) 2365 
20 20 


= 90833 - 8236.9 =846.4 
Step 6: 5S between the two levels of B (or the second independent variable) 


- 82369 








Z ‘ 
~ 2% + 3X)! | (EX, + EX,) _¢ = (181+102)" | (198 + 93) 


n, +n, ny +n, 10+10 0410 
_ (283) _ 2917 
"20 * 99 ~ 82369 = 82385 - 82369 = 16 


Step 7: | " 
Nleraction §5 = Among 55 between SS tor A— between SS for B 
= 664.9 - 846 4_ 1.6 =8649 — 848 = 169 








tion of ANOVA in a2 x2 factorial q esipn 


Research Design 519 


The initial steps involved in calculation of the analysis of variance of a factorial design are 
as those found for a randomized-groups design (see Chapter 23). The first step is to 
he total sum of squares (55) and subsequently, divide it among SS (see Step 3 of Table 
31.9) and the within SS (see Step 4 of Table 21.9). Prior to computing the total sum of squares, we 
need to cornpute the correction value (see Step 1). The among SS gives an indication regarding 
whether or not the groups differ. If one Is simply interested in knowing whether or not the 
different groups under study differ significantly, one need not undertake a factorial design. An 
overall Ftest will not answer this question. But as has been stated earlier, in the above experiment 
there were specific purposes. The experimenter was interested in whether or not variation in each 
independent var lable affected . the dependent-variable score and whether there was any 
significant interaction. Hence, it is essential to divide the among SS into three components: 
between 55 for A between 5S for B, and the A x B interaction. In order to know whether or not 
there IS significant difference between high-noise condition (A,) and low-noise condition (A,), the 
55 between these two values was computed (see Step 5). Likewise, for knowing whether or not 
there is a significant difference between the high level of illumination (B,) and the low level of 
‘umination (B,), the SS between these two values was computed (see Step 6), The Ax 
interaction was easily calculated by subtracting the values of between SS for A and the between 
55 for B from the among 55 (see Step 7). The calculation of the interaction SS completes the 
computation of the different sums of squares. It must be noted that all the numerical values from 
step 1 through 7 must be positive. In case a negative value comes at any of these steps, the 
experimenter should correct the error before he proceeds ahead. After the computation of the 
different sums of squares is over, the next step is to prepare a complete summary for the analysis 
of variance as given in Table 21.10, 


the same 


Table 21.10 Asummary of the complete ANOVA in a 2 x2 factorial design 


Source of Sum of ; DF Mean square or variance E 
variation squares s 
Among all groups 864.9 3 288.3 47.11* 
Within groups 220.2 36 6.12 
Between noise (A,, A,) 846.4 1 846.4 138.30 
Between illumination 
(B,, B) 1.6 1 1.6 0.26 
Interaction (A x B) 16.9 1 16.9 2.76 
Within treatments (error) 920.2 36 6.12 
Total 1085.1 39 


ee: es Sc ee naan 

* Not essential for factorial design. 

The entries above the dotted line in Table 21.10 may be omitted in a factorial analysis of 
variance. However, if the purpose of the experimenter is to know the overall differences among 
all the groups, they should not be avoided. The sums of squares for all the six rows of Table 21.10 
have already been calculated in Table 21.9. The formulas for calculating dfs are as follows: 


ewer Sclernces 
yh Meein wits in Babar ae t 
4 Ayes * i 
fi grit 


cs ied 
cag, Mert 
s20 =" 


yunre 

, K- 
- benween! all groups 21) 
14 


Amt | 
or within treatments 


N-A Qh, 


within BrOUP® ” ‘ 


Interact! 


Total dt =N- | 


ip ber of groups and N refers to the total number of gp. 

1.1t021.3 ‘ een oom el 40 and K = 4. That is why df for among 

the eee ig based upon all the four groups; and af for within so 

x -}=4-1=3 aul pe ss is divided into three parts, namely, between SS for (f : 
y-K=40-4= - $s an a for B (second independent variable) and interaction SS 
endent — a ds ded into corresponding three parts. There are two groups i 

also the of 'or 2 K a > 1=1 Similarly, there are two groups for B and accordingly, of for p 

Hence, df for A . : ; The A effect and B effect are called the main effects. Now for 4 

by the same formula, ts |. is equal to df for A xdf for B. Hence, 1 x1=1 7. 


df which, in a2 x2 factorial design Is eq A eye 
interaction af ‘able interaction which is called a first-order interaction. Similarly a 
: , 


nple of a two-van : 
jee —. Scenes (AxBxCanda four-variable interaction (AxBxC xD) areknoy 
three-va d third-order interaction respectively. As a check of ca Iculation 


ond-order interaction an ; ag 3 ; | 
- sn the afs for A effect, Beffect, interaction effect and within treatments are added together the, 
should vield a value equal to N-1, which becomes the total af. In the randomized-groups deg; 


the df for the among SS plus the df for within treatments should be equal to N — 1, which is again 
total of. The mean square or variance in each of the six rows of Table 21.10 has been obtained by 
dividing the respective sum of squares by the respective df. For example, within treatments BY 

ps, the row has 220.2 as sum of squares and 36 as df. When the former is divided by the 
latter, we get 6.12 as a mean square. Assuming a fixed model, the values of four F ratios have 
been calculated by using the within variance as a denominator in all cases. For example: 


In ex}. | 
participating nm 


nee ara 
6.12 
fin SOA 243830 
6.1 
pa. 202% 
6.12 
pana 56 
6.12 


interpretation of F ratios: There were three specific null hypotheses in the 2 x 2 factorial design 
described above. 
1. There is no difference between the means of the two conditions of noise. In other words, 
the means of A, and A, do not differ. 
2. There is no difference between the means of the two conditions of illumination. In other 
words, the means of B, and B, do not differ. 
3. There is no interaction between the two independent variables, that is, between noise 
and illumination. 
These three null hypotheses were tested by the last three Fs. The first F is intended to test a 
general null hypothesis which indicated no difference between all the four groups under study. 


Rewarth fisign S21 


n be avoided in a factorial design, let us deop its discussion, The interested reader Can 
ty this Fin the same way as for the F test in a tandomized-groups devugn (see Chapter 23). 
= Let us turn to the first null hypothesis which relates to the effect af nanse upon the rate of 
ning. Let the alpha level for each F test be 0.05. For the F relating to Aefiect, we have | dif tor 
is ,rator and 36 df for denominator, The value of F required at 0.5 level of significance waeathy 
ae ofl 36 df, is 4.11 (see Edwards 1968"). Since the obtained F of 138.30 exceeds this value, we 
1d eject the null hypothesis and conclude that two conditions of nowe, averaged over the two 
oan i 8, produced a significant difference in the rate of learning. As the mean for high-norse 
edition (18.95) is larger than the mean for law-noise condition (9.75), the expenmenter may 
con jude that high noise retards the rate of learning because the subjects took more tials 0 
Oe ae the criterion. The second null hypothesis relates to the effect of lurnination upon the 
Yok learning. Again, we have 1 df for the numerator and 36 df for the denominator for the 
obtained F of 0.26. As this F is less than 1, without consulting the probability table for F, t can 
mediately be taken to be not significant. Hence, the second null hypothesis is accepted. We 
im jude, therefore, that two conditions of illumination, averaged over the two levels of A, are 
co producing the significant difference in the rate of learning. Thus the obtained mean difference 
He between B, and B, is due to the chance factor. 
O After the interpretation of the main effects is over, the interpretation of interaction effects is 
taken up: In a2 x2 factorial design, there is only a first-order interaction. The interaction Fis 2.76 
nd we have 1 df for the numerator and 36 df for the denominator for this F. As this value of F iS 
a than the required value of F (that is, 4.1 1) to be significant at 0.05 level, we accept our third 
null hypothesis, too. As the null hypotheses for A x B interaction is accepted, we may say that the 
Aettect, that is, the difference between A, and A, is not dependent upon 8. In other words, we 
have approximately the same difference between A, and A, irrespective of the levels of B. Thus it 
‘< finally concluded that the rate of learning in high-noise condition and low-noise condition is 
not affected by the given level of illumination. When nonsignificant interaction is graphically 
shown, it presents the parallel lines. This can be easily checked by Figure 21.4. When the 
interaction is significant, it yields nonparallel lines in a graphical representation. 
‘sh 


19.8 
| 2 


5 qhies ta 


7 


Means 


13} 
10.2 
Boon GS 
SF Ao 
a ae 


0 B, B> x 
Independent vanable 


Fig. 21.4 Means for levels of A at each level of 3 


The factorial design discussed above deals with only two variables. It can, however, be 
extended to three or any number of independent variables. Due to the reason of complexity 
involved in calculation, we shall limit our discussion for three independent variables only. The 
simplest such factorial design in which we have a total of eight treatment combinations Is 4 
2 x2 x2 factorial design. In this design, we have three factors (or independent variables) with two 
levels for each factor. The general case for a factorial design utilizing three independent variables 


* AL Edward's Experimental Design in Psychological Research, Holt Rinehart & Wilson, 1968, Table VIII, p. 422 


, ods in Rebavioural Sciences 


é ~ a fete 
5. M 


sign where kK, L and 


Research Design 523 
M stand for the first, second and third inden 
ni 


is a KxLxM oe ; ey 10 10 9 8 16 14 12 15 

; actively: :, : i : 
variables si : 2x2x2 factorial experiment with igh example from a randomized. 9 12 6 '. 15 18 12 16 

wee mele dependent variable in an ee A retention of ; a 9 I 6 8 14 17 11 17 

: pose j . Per ar 
Fe experimesi wants to investigate the ——— el ee Ct and he 8 9 5 6 14 17 10 16 
task. The exps e retention score. Let the first Indepe € be the man." a. a a aa an ae ee a 
-aractions, if any, upon the re ig lia Ae | , i 93 45 70 151 173 113 157 
sinc f the materials. This may be called Aand this independent Variable js Manipyl: ; ait 
presentation O'” | and auditory. That is, in one case the entire list is presented to the vee 3X? =724 895 225 506 2287 3021 1291 2473 
oa —visual al y Santon ¥ 
in They themselves read it. This constitutes the visual mode of presentation. In another Case Mean = 8.4 9.3 45 7.0 15.1 17.3 11.3 15.7 
an | 


entire list is read aloud to them and this constitutes the auditory presentation of the Materials Lo 
the visual mode be called A\ and the auditory mode be called A>. The second inde 
variable is the number of presentations and this factor may be oe = as B. This factor Say 
varied by the experimenter in two ways: one complete presentation of the list and two comple A | é, 
presentations of the list. Let one presentation be designated as B, and the two Presentations he e, 


Table 21.12 Means for the eight experimental conditions in a2 x2 «2 factorial design 

























The third factor is the condition in which materials are presented, Let this condition be called ¢ a: a Tenet: 
and be manipulated in two ways: noisy condition and quiet condition. We designate noi (AB,C,) (AB,C,) (A,B,C\) (A,B,C) 
condition as C, and quiet condition as C,. Thus, all three factors, that is, A, B and Cau 5X =84 EX =93 EX = 45 EX =70 
manipulated in two ways and therefore, each factor has two levels. There will be eight treatmen P Mean =8.4 Mean =9.3 Mean =4.5 Mean =7.0 
combinations and a treatment will be obtained by selecting one level from each independer | 4h 
variable. For example, A, B, C, is one treatment and consists of a visual mode having two m =10 ny =10 n, =10 , NM = 
presentations in the noisy condition. Similarly, the other seven treatment combinations can by (A,B,C) (AB.C,) (A,B,C) (A,B,C) 
interpreted. Suppose, further, that 80 randomly selected subjects from a population of Students Lx = 151 BA = 173 2X = 113 EX =157 
belonging to an intermediate class have been made available to the experimenter. Assuming a F Kean S151 Maan ITS Kean et13 Maan = 15.7 
randomized-groups design, these 80 subjects are randomly assigned to eight treatment 4 =10 n't 10 a 240 
combinations or groups so that each group has 10 subjects. The dependent variable score = =< 

ek tie in this experiment was presented in Table 21.11. The data of Table 21,11, zx = Total score 

may be reduced like those shown in Table 21.12 so that the means of eight groups m; ) ina: i 

in the appropriate cell, which in turn, provides facilitation in further lailaton Thea Table 21.13 Illustration of the details of the calculation of ANOVA S22 <2 factona 
data are shown in Table 21.12. design (data are from Table 21.11 and Table 21.12) 


Seater 
Now let us return to the calculation of ANOVA in a2 «2 «2 factorial experiment. (EX, +X, +EX, + EX, + DX. + EX, + EX, FEX,? 
year i d ss a = 


As stated earlier, the initial steps of calculating ANOVA are similar to those found in the 2x? Step 1; Correction = — . 
factorial design presented in Table 21.9, | 


= =9812.45 


(84+93+45+70+1514173+113+157)' _ (886) 
80 


Table 21.11 Fictitious dependent variable scores in a2 «2 x. factorial experiment ” 










Gr. | Gil Gril Gh Gey °c ; i | _ a 
; : WN FVE Greville Gr. Vil Step2: Total SS =(£X2+EX24EX2 40X24 0X2 42X24 EXF LLNG)-C 
ABC, ABC, ABC, ABC, ABC, ABC, ABC, ABEL ep 1 2 3 4 5 6 y aig 




































re 10 3 4 1é 5 =(724 + 895 +225 + 506 + 2287 + 3021+ 12914 2473)-9812.45 
. 12 15 
8 2 6 16 15 13 ta = 11422 —9812.45 = 1609.55 
6 8 14 | | 4 | 
20 13 16 vi 2 2 2 # KF oe? cx, F 
f 6 6 Fi 15 Step 4: Among sS= 241? , 2%) COX) XG EXSY ANGI (EX) —— sar 
/ 15 i i + * . 
e 10 4 : si 9 Ve (a4? (93% (45% (vo (sit (173)? 13" (57/ —— 
: 16 19 10 15 og Eg 5 + ——_ + —_ + “512.45 
10 10 #10~=©6©10 10 10 10 10 
a 


7056 + 8649 + 2025 + 4900 + 22801+ 29929 +12/69 + 24649 9812.45 
nO 10 


* A 7 : 
lacionial experiment 


Can be cond Deana ce ' 
design and Latin-square design, ducted not only with the randomized 


“froups design but also with randomized-block 


id Research Methods 1" Behavioural Sciences 
pers gine 


524 Tests, Measure 


112778 9812.45 = 146535 
= 410 


step 4; Within 5$= Total SS - A 


step 5: SSbetween two levels of A 


(BX, + EX, +EX5 + EX,)? | (EX; +EX, + EX, +EX)§ 


mong $5= 1609.55 ~ 1465.35 = 14420 


(or first independent variable): 


10+ 10+ 10+10 
? acy 
(501 , B85) _ 9912.45 


———— 


4() 4() 
= 6275025 + 3705.625 -9812.45 = 168200 


(9449341514173) (45+704113+ 157) 


10+10+10+10 


Step 6; $S between two levels of B (second independent variable): 


= 


_(84+93445+70)' | (1514+173+113+ 157) 
10+104+10+10 10+10+10+10 
292 594) 
40) 4() 
= 21316 + 88209 -9812.45 = 114005 
Step 7: SS between two levels of C (third independent variable): 


= PM +EXy +EXS HENS) OX, +EX, HEX, + EX)? 


+n +n + : 
+n; +5 +n, Ny +N +1, +N, 


~ 84+ 45+1514113" | (93+704+1734157) 
10+10+10+10 "  10410+10410 
(393) , (4937 





— 
———_.. 


Sa 


Step 8: Interaction SS: 4 x B 


~ lla+d)-(b +)? 
dn 


_ (177+ 270)-(115+324)]2 


(EX, +EX,+EX; +EXg) | (EXs +EX, +EX, +EXg) C 


—9812.45 


— 


—9812.45 


40 rr ~ 9812.45 = 3861225 + 6076225 - 9812.45 =125 


(420 78 
Step 9: Interaction $5 A x E 
lla+ dj ~( > {(2: 

_lla+ bral _ (235+227)~(158 + 266))2 
Cg eB 8B) 18.0 
Step 10): Interaction SSB xC “ 

atd)-b+ cy 5 
ROC ci 1029+330)- 2644 1639} 
: EL G 
(420) = 128 





—9812.45 


Research Design §25 


step V1: interaction SS: A x BxC 


Among 5° 
interaction 


_. ($5 for A+ 5S for B + SS for C+ Interaction SS for A x B + Interaction SS for A x C+ 
S$ for B x C) 


= 1465.35 —(1682 + 1140.05 + 125 + 08 + 18.05 + 128) 
= 146535 — 1464.90 =0.45 


An inspection of Table 21.13 reveals that the initial steps (that is, up to Step 4) for calculation 
analysis of variance is similar to those found in the randomized-groups design. The among SS 
is the overall differences between the groups under study. In a2 x2 x2 factorial design the 
ia ng $$ (or treatment sum of squares) is divided into several parts: between SS for A, between 
sip between 55 for C, interaction 5S for Ax B, interaction SS for A x C, interaction SS tor 
C and ‘nteraction $$ for Ax B x C. The between SS for Ais computed to know whether or not 
there 1S significant difference between the visual mode (A,) and auditory mode (A) of 
resentation. Likewise, between SS tor Bis computed to know whether or not there is significant 
difference between one presentation (B,) or two presentations (B,) of the list, so also the between 
§Sfor Cis computed to know whether or not there is difference between noisy condition (C,) and 
quiet condition (C,). 


The interaction S$ was computed with the help of the following formula: 
((a+ d)-(b+c)]’ 
4n 
Equation 21.4 is applicable when the data of the factorial experiment has been reduced to a2 x2 


table anda, b, c andd are the four cells of a2 x 2 table. Aschematic representation of a2 x2 table 
for computing the interaction SS with the help of Eq. 21.4 is presented below. 


(21.4) 


Table 21.14 Aschematic representation of a2 x2 table for computation of 











the interaction $$ 
A A 
B, a b 
B; C d 


Thus, for computing three first-order interactions the data (£X) of Table 21.12 has been 
reduced as shown in Table 21.15. 


Each cell in Table 21.15 represents the sum of 20 scores or observations. 


Table 21.15 A2 «2 table for AxB, 4xC andB xC interactions 





(i) 2 x2 table for A x B interaction: 


A A; 
B, 84 +93=177 70+ 45=115 
B, 151+ 173 =324 1134157 =270 


EEE 





nana Reawernh Mothends in Rodurionnal Sctomces 
tea, Adeeenervrenerrrcs ere 
4m first 


—- — — 
———— a 
—_—= 


‘yahble tor Ax ¢ Interaction: 


iu Z ‘ « 
A, A, 


C. 844 151=235 45+ 119 = 155 


C; 93 +173 =266 | 704+ 157 = 929 


Hid? » 2 table tor B « C interaction: 
B, B, 


C, 84+ 45=129 1314+ 113 =264 
C, 93+70=163 173 + 157 = 339 
aaa! 

Based upon these scores, the interaction SS has been calculated (see steps 8, 9 and 10 »: 
Table 21.13). When the sum of squares for A,B,C, AxB, Ax C and Bx C has been Caleulg 2 
the second-order interaction (A x B x C) can be easily calculated by adding the above six gine, 
of squares together and subtracting the summed value from the among SS. The 
interaction can, however, be calculated directly but owing to its lengthy process of ¢ 
the author does not think proper to elaborate upon it here. Once more it 


A xB KC 


alculation 
must be added that al 


Table 21.16 Asummary of the complete ANOVA in a2 «2 x? factorial experiment 





Among all groups 


1465.35 7 209,336 104.511 
Within groups 144.20 = 72 2.003 
A: Mode 168.20 | 168.20 83.974 
B: Number 1140.05] 1140.05 569.171 
C: Condition 125.00 | 125.00 62.406 
A» B: Mode x Presentation 0.80 | 0.80 0.399 
A»xC: Mode «Condition 18.05 | 18.05 9.01] 
B»C: Presentation xCondition 12.80 ] 12.80 6.390 
A»#»*C: Mode «Presentation «Condition 0.45 0.45 .224 
Error: Within groups 144.20 72 2.003 


Total: 1609.55 79 


values from steps | through 11 must be positive. If any one of them happens to be negative, the 


investigator should try to find out the error before he 
imetaction, the sumrar 
Table 21.16. The entries 


proceeds further. After calculating A x8 xC 
y of the complete analysis of variance is set up in a manner presented in 
above the dotted line in Table 21.16 may be omitted if the purpose of te 


Meseuans ds fwoipe 27 
perimenter 1 not to determine the , 
on 


varticularty true tn Ihe ease af a factor 
dependent variables, the among 55 19 liviclee| 
divided into seven pvets The eff fey arricnyy sen 
and 21.2 reaper lively. For calculation ¢ A eee dleulated by eqs. 21,1 

only WO RlOUPS, Celt h POnsiiting af af SUDIEr TS This A and ¢ Pit i gi nei — 
temporarily ignored. As two groups form for A. the df ter the A wtlect by ‘ Wie ; 7 wi 
similarly, for the sur of squares for A. the two ether Mie tenclert ‘aaa id ve | s c\ ‘i 
temporarily ignored and for the sum ef squares for © tye independent variables th i a a ai A 
are temporarily ignored, In each case, all the BFOUpS are reduced to iia ln B a i ‘as ; 
Consisting af 40 observations is fora eek see 


ar ’ 
wceattaphy ie thet edie an) ind another BIOUp CORSISINg of anther 40 observations is 
for B). similarly, in the C effect, one BfOUp Consisting of 40 ommervations is for C, and another 


soup CONSISUNE of another 40 observations \« lor C, The df for 8 etfect. by Eq. 20.162 =) 
and df tor C eftect by the same formula 62 -Je=] For calculating the dflor A « B 2 INteETACTION, 
we multiply the dftor A with the df for B with the df for C. As we have | df lor each of three main 
etfects, df for AxBxC interaction js also 1. Similarly, the df fo three first-order interactions 
have been calc ulated. The mean square of variance tor each source 1 calculated by dividing the 
sums of square with their corresponding df. For example, the within SS is 144.20 which, if 
divided by 72, ylelds 2.003 as the mean square. The df for within groups has been calculated by 
Equation 21.2. As the above example illustrates fixed model of analysis of variance, we can 
resasonably choose the within varance 10 act as the error term in calculating the F ratios. 
Therefore, the Fratios tor main effects and interaction effects have been calculated as follows: 


ner all litte 
: rere ir ifr vr 
| a § the emht prey 
‘| EX PeriTwert fp, Wat yroups, This ts 


4 lar tewyal he Perment! having Wires 
Accordingly, d& can alsy be 


tts we yery paints 
Ue nel weithian 
H the garry CF Wr iaarers, fee 





: B2: 
Ftor A= 16820 =Bi974 
2D03 
’ 114005 
F tor B= 14005 = 569.171 
2003 


ee $25 
F for C = ——. =62 406 
2.003 


O50 








FiorAxB =—— =(399 
2003 
F for AxC => =9011 
2.003 
12 BO 
F for Bx 7003 
0.45 
: = = 74 
F ior Ax Bx 700} 


Interpretation of F Ratios a - 

In the above factorial experiment the alternative hypothesis” (AH) and null hypothesis (NH) 

were as follows: 

| of (A,) and 

i) There will be significant difference between the means ol the — es = 
auditory mode (A,)—AH. There will be no significant difference between the means Of vis 


* mode (A,) and auditory mode (A,)—NH. 





| .d-hypotheses may also be developed 
* All altermative hy potheses have been developed in the nwo-tailed manner, One-tailed-hypot Y 





: F i — = 


ea, Mews ' 
578 rence between the means Of one presentation 


) There will be significant diffe B) “é ‘ 

(il : | 

resentations ae ‘i significant difference between the means of one presen tation ~ 
There will no 

ei tween the means of noisy condition C) 


ntations— wz 
i) There will be significant difference De and Gig 
condition (C;}—A eeant difference between the means of noisy condition, 
; ionificant difierence and 
There will not be sig u 
ition —NH. ; “al eae 
condition 4] be interaction between the two independent variables, namely, the me 
(iv) There will be +he number of presentations (BI—AH. tho, 


ion (A) and : 
of ena be no interaction between the mode and number of presentations—NW 
There wi a : . : 
; ; of presentation (A) and ti 
will be interaction between the modes of p Conditions in 


(v) There 


} , ted—AH. 
hich materials are presen | pis aig ge 
Ww There will be no interaction between the modes of presentation and conditions (CQ in Which 


materials are presented—NH. ber of tations (B) and 
ond ntations ind iti 
(yi) There will be interaction between the number of prese and conditions (Oin 


Quigg 


which materials are presented—AN. | | | 
There will be no interaction between number of presentations and conditions in Which 


materials are presented—NH. | 
(vii) There will be interaction among the three independent variables, namely, mode ts 


number (8), and condition (Q—AH. 
There will not be any significant difference among the interaction of all the three 
independent variables, namely, mode (A), number (B) and condition (Q—NH. 
The first three hypotheses are related to the main effects, the last hypothesis is concerned 
with the second-order interaction and the remaining three hypotheses are concerned with the 
first-order interaction. Let the alpha level be set at 0.05 level. The F for the first main effect, that is, 
for A effect is 83.974 for which we have 1 df in the numerator and 72 dfin the denominator, The 
required F value for df = 1/72 at 0.05 level = 3.975 (by interpolation). As the obtained F exceeds 
this value, we reject the null hypothesis and accept the alternative hypothesis. This A effect 
represents the comparison between the means for the visual mode and the auditory mode, 
averaged over the two levels of B and the two levels of C The mean for the visual mode js 
501/40 = 12.525 and the mean for the auditory mode is 385/40 = 9.625. This obviously means 
that the visual mode of presentation results in a more superior average retention than the auditory 
mode of presentation, 


ail i we ne corresponds to the compar ison between the means of one presentation 
for Befec is 569.171 and we have Taf aon OF Aand the two levels of C. The obtained F 
obtateisd Fececaske and we have | df in the numerator and 72 df in the denominator, As the 
and the slhavustece e bee: value of F (3.975) at 0.05 level, the null hypothesis is rejected 
feane of ene ial accepted, Thus, there is a significant difference between the 
292/40 = 7.3) js fi, ar gli {wo presentations. As the mean for one presentation 

ar less than the mean for [WO presentations (594/40 = 14.85), we conclude that 


'Wo presentations of th : 
. the task ; 
presentation, definitely result in a superior average retention than one 


The C effect r 
epres ere | 
condition emcee Bie comparison between the means of the noisy condition and the quiet 
€ two levels of A and the two levels of B. The obtained F for Cis 





Research Design 529 


9 406 which has 1 dfin the numerator and 72 df in the denominator. The obtained F exceeds 
ee uired F for the above dfs at 0.05 level. Hence the third null hypothesis is also rejected. 
ne ee alternative hypothesis is accepted. The mean for the noisy condition is 393/40 =9.825 
‘ “ie mean for the quiet condition is 493/40 = 12 325. As the mean difference is significant, We 
ands that the quiet condition definitely results in a superior average retention than the 
meee condition. | | | 
Of the three first-order interactions, one is not significant and two are significant. The A x B 
saraction is not significant because the obtained F for this interaction is less than 1. The 
gor nificant nature of the AxB interaction is also checked from the two parallel lines in 
Pee 2 . As F tor A x6 interaction is not significant, the null hypothesis, rather than the 
itive hypothesis, is accepted. The nonsignificant AB interaction suggests that the 
qifference between the means of visual mode (A,) and the auditory mode (A,) is not dependent 

| neither one presentation (B,) or two presentations (B,) of the task. Or, it may be interpreted in 
Da rsed direction. But A x C interaction is significant because the obtained F for this interaction 
Fett which has 1 df in the numerator and 72 df in the denominator. As this obtained F 
sxcaeds the required Fat 0.05 level for the above dfs, we reject the null hypothesis and accept the 
alternative hypothesis. The significant nature of Ax C interaction is also obvious from the two 
nonparallel lines in Figure 21 6. The significant AxC interaction suggests that the mean 
difference between visual mode (A,) and auditory mode (A,) is dependent upon (or determined 





| 
16.2 A, 
15+ we 
Bia Ae ° 
S 10} 8.8: 
= le75 
5) 
z ' 
0 B, B> 


Independent variable Independent variable 


Fig. 21.5 Means for levels of Aat each level Fig. 21.6 Means for levels of A at each level 


of B of Cc 


by) the noisy and the quiet condition in which the task is presented. Thus A effect is not the same 
for the two levels of C. In noisy condition (at C,) the visual mode of presentation (A,) yields the 
mean of 235/20=11.75 and the auditory mode (A,) yields the mean of 158/20=79. The 
difference between A, and A, for C, is 11.75 —7.9=3.85. In quiet condition (C.) the visual mode 





Independent variable 


Fig. 21.7 Means for levels of Bat each level of C 


ts and Research Metboas in benavioural Sciences I 
ts, Measuremenils & 
$30 fe 


(20 =133 and the auditory mode yielded a * cies 

yielded Nags eos A, and A, is 133-1135 =195. A simple compari, , 
: MABNitude he 
Neraction Sign . 
ained F(g ; 0) me 
?2dfin the de - 
he SIBNificany 


mean d for C, as well as for C, reveals that the 
difference vee les dt of C, a fact which has made the | 
not the ai B x C interaction is also significant because the obt 
Ax C Kanes of F(3.975). This F also has 1dfin the numerator and 
wat ie alternative hypothesis and reject the null hypothesis, \ 
‘nteraction can also be read from two nonparallel Hines shown in Figure 21.7, Tha 
BxC interaction may be taken to — Denne Inga ditievence between 5, andp a el 
of Cis not the same. In noisy condition (C,) the mean of one presentation (B,) is 129/9 eth e 
the mean of two presentations (B,) is 264/20 = 3.2. Therefore, we Bet a mean differ Age. 
between B, and B, for C,. In the quiet condition (C)) the mean of one presents 
163/20 =8.15 and the mean of two presentations (B,) is 330/20 =4 6.50. Thus fo ation 
difference between B, and B, is 8.35. It is obvious that the mean difference betwee 
C, is not the same as the mean difference between B, and B, for C,. The greater this aif % Bf 
greater the probability of the interaction being significant, The AxBxe Fence ip 
significant and hence, our last null hypotheis is not rejected 


interact, nj ‘lle 
, meaning thereby that Stig 
alternative hypothesis is rejected. This second-order interaction 
a little more explanation. 


or three-factor INt€ractign 3, 
cn | 
Just as the two-factor interaction, for example A xB interaction, is the symmetrical 
of Aand B, so also A x B x C interaction 


is regarded as the symmetrical property of 4 be 
factors. This means that we can explain A x B xC interaction by considering interaction ase 
xB 


separately for each level of C or by considering A « C interaction separately for each level Of Boy 
by considering B « C interaction separately for each level of A, Let us explain A xBx¢C int " 
by considering A = B interaction separately for each level of C. As the Ax B 
significant because the obtained F for this interaction is less than 1, Ax B j 


*C interaction is no. 
nteraction for C, and 





20 
Wf Ay 

15 he 
vl a 1 a7 
$10} 93-— 
= en 

sl 7.0 
Independent variable Independent variable 


Fig. 21.8 Means for levels of A at each lev el Fig. 21.9 Means for levels of A at each level 
of B for C 


of A for ty 


interaction for C, and for C, should 
8 xC interaction by grouping Ax€ 
xC interaction separately for each 


in eraction by considering A x B interaction for each level 
e meaning ot the second-order interaction is that A x B interaction is the same for C, and 
© AB interaction for C ; has been 


of C, th 
C.. Th 


raphical| shown in Figure 21.8 and for C, in Figure 21.9. For | 
Bfapnica Y showing theAxBxcC interaction, the data have been reduced to a2 2 table formas ~ 


Presented in Table 21.17. |t is obvious that the forms of 
nearly similar, which conti 


the graph in Figures 21.8 and 21.9 ale 
rms the fact that A x Bx 


interaction is not significant. 





Research Design 531 


For C, 
ee 
I Or eer cere 
B, 5.4 4.5 B, 93 - 
ie i 13 B, 17.3 15.7 


Advantages And Disadvantages of Factorial Experiments 
The factorial experiments have several advanta 

1. In a factorial design two or 
manipulated whereas in a single |v experiment, as its 
designed to study the effect of independent variable 
economy of time, labour and money, 

2. The factorial experiments 
variable. In a single IV experime 
independent variables because on 


8€9 Over a single IV experimment. 


more than two independent Variables are simultaneously 


name implies, a separate experiment is 
s. Thus a factorial experiment provides 


also permit the evalution of interaction upon the dependent 

nt, One Cannot evaluate the effect of the interacion of the 

ly One independent variable is manipulated at a time. 

3. The experimental results of a factorial experiment are more comprehensive and can be 

generalized to a wider range due to the Manipulation of several independent variables in one 

experiment. From this point of view, the single IV experiments suffer a major setback. 
4. In factorial experiments, there is a 


Nn additional gain occurring due to the hidden 
replication arising from the factorial arrange 


ment itself, 
Despite these advantages, a factorial experiment has some disadvantages, 


1. Sometimes the experimental set-up and the resulting statistical analyses become so 
complex that the experimenter may wish to drop this design and return to a single IV experiment, 


This is especially true when more than three independent variables, each with three or more 
levels, are to be manipulated together. 

2. In factorial experiments when the number of treatment combinations or treatments 
becomes large, it becomes difficult for the experimenter to select a homogeneous experimental 
unit (or subject). 

3. Sometimes, it happens that some treatment combinations arising out of the simultaneous 


manipulation of several independent variables become meaningless. Then, the resources spent 
in those combinations are simply wasted. 


Selecting a Correct Error Term in a Factorial F Test 


To select a correct error term in ANOVA is an important problem. In the randomized-groups 
design, the within-groups mean square Is used as the correct error term in an Ftest. This is also the 
usual case in the factorial experiment. But sometimes we do not use within-groups mean square 
as the error term in a factorial experiment. In order to understand this situation, we shall examine 
three common types of factorial experiments. The first type is known as a fixed model, the second 


| 


type is known as a random model and the third type is known as a mixed model. A oe 
refers to the characteristics of the independent variables used in the factorial experiment. Let u: 


discuss these three models separately and illustrate each with the simplest factorial design, that is 
with a2 x 2 factorial design. 


2 Tests, Measurements and Research Methods in Bebavioural Sciences 
53 esis, ih 


Fixed Model: In a fixed model of ANOVA, the experimenter has some Obvious 5 
reasons to select the values of the independent variables. In other words, the eRe Pary 
does not choose the values of the independent variable at random; rather he chine th 
some particular reasons. For example, the experimenter may choose the differ 


light, practice, and the methods of teaching, etc., as the different values Is 


; of the ; Noy 
variables for his experiment. He may choose to compare a 10-trial performance wee Pend 
<4 
ee iMpact upon shewues nal 
variable measure. All these illustrate the case of a fixed model. In psycholo Pbeny "| 


performance; he may choose methods A and B and compare their 


case of the fixed model is highly common. For the case of a fixed model, the RS re 
square (MS) is always used as the correct error term for all F tests. Thus , for a2; 


Ds 
we have three Fs which are calculated as follows: 


F for A(or first independent variable) MS for A 


MS for B 
—_ ee 
MS tor within- groups 


F for B (or the second independent variable) = 


F for A xB __MS tor AxB 


MS tor within-groups 
In a factorial experiment with two independent variables the total sum of s 


into four components, namely, the sum of squares for factor A, for factor B, for interactio Ay 

and for within-groups. Based on these four sums of squares, we calculate four oo 
variances separately and our null hypotheses are tested in terms of F tests, wh 
terms of the ratio of these variances. Except the mean square for with 


quares is divided 


ich are expressed in 
in-group, all other mean 
Table 21.18 Expected mean squares for a2 2 factorial design in different models 


Expected Mean squares By 


les 


Sou rce of Fixedmodel Random model Mixed model Mixed riod ; 

variation (Model-|) (Model-II) (Model-I!!) (Model-IIl) 

Afixed, B Arandom, B 

| random fixed 

Independent a4, + 2nc” SAC 2 = 7 : 7 
Independent of,+2ne2 92, 2 2 , ) 

varablesd Ww B Sy + 2Nos + 2no, Oy + 2noy, Gy + 2NGag + nog 
Interaction =o, + 2g? 2 2 , : 
AWB ane Oy + 2NGrg Oy + 2no?, Oy + 2nG Ag 
Within-groups 2 , 

oe Ow ow Ow Ow 


squares may contain more than on 
mean square for a fixed 


€ component of variance. 
are discussing only this 


| Table 21.18 presents the expected 
ea random model and a mixed model. Look at the fixed model as W€ 
by oy, + 2no%,. This means a © baie The expected mean square for the A factor is expres 

; al the mean square for A Tiss | . es 5 
(oiy) and 2n times the variance of the A ; contains the mean square for within-grouP 


actor (o4). For the present purpose, we can ignore @ 


t 
ENt leva ‘my Wig 


S€ are 
Ws 


mean SQUares of | 


Research Desipn 533 
constant like 2n and concentrate on the mean squares or vari 
,quare for Acontains the within variance as well 
4 zi 


mean square by the within variance. This is the rationale behind using the within variance as the 


correct error term in a fixed model. The same reasoning applies for the independent variable B 


and hence, the ~ ine ale 3 used as the error term, The expected mean square for 
interaction A x B contains the within variance ( 


SW) and n times the interaction variance (og): 
Dropping the sony n usual, we may say that the expected mean square for interaction A x B 
is composed of the within variance as well as the interaction variance. When we divide this 


expected mean square by the within variance, we get the actual F value for the interaction 


variances only. Since the expected mean 
d5 Variance due to A, we divide this expected 


Random Model: When the values of the independent variables are selected at random, we have 
the case of a random model. In a2 ~ 2 factorial design, suppose the experimenter selects the age 
of subjects and the method of teaching as two independent variables. Now, he may decide to 
select two levels of each independent variable, He may place reasonable possible values of age, 


say, ranging from 6 years to 30 years (25 slips), into a box and draw any two slips (which specify 
two different age levels) at random. Likewise, suppose for example, there are 6 reasonable 


ossible methods of teaching. These may be numbered from 1 to 6, each on a separate slip, and 

laced in a box. After reshuffling, the experimenter may select any two slips at random. The 
number thus drawn will represent the two methods of teaching selected at random. The column 
for arandom model in Table 21.18 indicates that the expected mean square for the independent 
variable A= Gy + Nog + 2Nc%,. Dropping the constants 2n and n for the present purpose and 
concentrating only upon the variances, we can say that the expected mean square for A consists 
of the within variance (a;,), the interaction variance (c4,) and the variance due to the factor A. 
Hence, we need here to divide both the interaction variance and the within variance. Since A x B 
interaction contains these two variances, that is, the interaction variance and the within variance, 
we divide the expected mean square for A by interaction variance. Similar is the case with the 
independent variable B. Hence, we divide the expected mean square for this variable by the 
‘ateraction variance. The A xB interaction contains the within variance (aj) and the interaction 


variance (o7,,). Hence, we divide it by the within variance and obtain the F tor the interaction 
variance. Thus ina random model of a2 x2 factorial design, we compute the three Fs as follows: 


MS for A 
is 
MS for interaction 
MS for B 
Ffor B= : 





MS for interaction 
MS for interaction 
jor AxB= 
— MS for within-groups 
Mixed Model: As its name implies, the case for the mixed model is one in which one vari *- “ 
fixed and another variable is random. Let us first consider a mixed model in which the 


, ae ; odel the 
independent variable Ais fixed and the independent variable Bis genni okie ve ve 
expected mean square for the independent variable A consists of the within variance toy), U 


, | | yA eX ) are for the 
interaction variance (o%,) and the variance due to A(o’,). As the expected mean squ 


| first | iance ivide the 
AB interaction in such a mixed model contains the above first two pce ba : ‘ — 
expected mean square for the independent variable A by the interaction Se 
Ffor A. The expected mean square for independent variable B contains tne ci 


Sie dens eae nga ad 
and the variance due to B (oj). Hence, we divide this mean square by the within variance . 

| een Sik | i0 tains the withir 
obtain the F for B. Likewise, the expected mean square for A xB interaction contains 


) Ay). Dividing thi ' 
) and the interaction variance [oy !. Dividing this mean “quan hy he ; 
wwe [Oy / 4 ar " | 
pe re get F tot the interaction variance | 
caENS, model having 6 fixed and A random, the expected mean SU Mre 
In a mune « }and the vanance due to A(o4). Hence, we divide thi EXO 
ATHAMICE (hy , — 
within hin vaniance and obtain the F for A The expected mean squ 
a ly. the within vanance, the interaction variance and the 
ane nan | ado. | 
as deus square for 4+ 8 interaction contains the above first Iwo y ATAN CES 
— mean square for 8 ts divided by the interaction variance, As the Oxy Cer me 
ag , interaction contains the within variance as well as the interaction y 
ior A* | | ee a Ss 
sails is deided by the within vanance. Thus three Fs in a mixed Model 


shown below 
Mived made! (4 fined, § randam) 


lon A . 
are for R co 
YANIANCE din 


Nth." 
Nan ges 
AWlanep { | y 








- MS tor A 
MS for interaction r. 
MS tor B | 
Ftor 8 = ————___ 
MS tor within-groups 
rhe « MS for interaction 


MS tor within-groups 


Mixed mode (6 tixed. A randam) 


MS for A . 
Ftor Aq —— 
MS tor within-groups ’ 
Fior 8 « __ MS for B 


MS for interaction 
ae oe MS tor interaction _ 
MS for within-groups- 


PROBLEM OF CREATING EQUIVALENT GROUPS IN BETWEEN-SUBJECTS DESIGN 


re we know in between-subject designs, a subject or Participant is asked to Participate in justone 
ct the Conditions of the experiment. Such design usually becomes necessary when subj . 
variables (such as gender) are being studied or Participating in one condition of the experi | 
wend to introduce changes in subjects in such 4 way that they can’t be used in another condition. 


common techniques lor creating equivalent groups in the 
ane technique is to use random assignment and another is to use 
i th 7a i aba that is different from random selection of 
Ps ned subjects Into different Broups in such a way that 
nt SlUdy has an equal chance of being placed in each group to be 


SUBJECTS, 15 a method tor plac 
formed. The major goal of 


difterence factors that may biaedhe Pa ae Aes 8 Clearly to take imo account individual 
of PATICipants, Wher lag , Y and spread them evenly throughout the different groups 


f 


© number : it i 
7675 OA subjects are laken, it is, then, hi 


: ghly probable that 
of completing d random assignment 


* 











Merceenrs dy jaotge $44 
it a | ta ‘i i iw ir r 
of curhyer te le conditions ina way thar PT 


experimenter ran use Hloek FANT Zar iery ahve by Si a: A frartic iparite prer eroupy the 
eat hy cowdition cn thie ExXPerirnent has zt) Stabyjese tacahenn| a a’ A ftor echure that erisutes that 
a ated asecend time Intact hee Path bleie & Zila, er lot helere any comclition is 
arandomized arder ah tl 
Matching: is another tey Hrvicquse thee 
ng, the participants are pare te - 
ir assigned to one of the yroup pa hing eels a mee A i of each pair is 
(a) When the number of the Wihjerts i. small, $00 taneice Hitrwing 
might yield unequal groups AT a6sign 
ib) When the matching variable j« anticipated to affect the 
fashion (that is, matching variable lets to comsetaac 
(c) When there is a reasonable 
variable 


Of these three criteria, the first is tre Most of the time (her 
even with larger N) but the second and third are essential 


WITHIN-SUBJECTS DESIGN 


Till now we were discussing the between-subjects des) 
design, the matched-groups design and the factorial 
independent variables are selected and one value is assigned to each FIIs hie expaciment. 
Thus in a between-subjects design, the experimenter Compares the dependent variable score 
between two or more groups who are given differential treatment. In aw ithin-subjects design the 
same group of subjects is treated differently in different experimental conditions and finally. their 
dependent variable scores are compared. This is also called repeated-treatments design because 
the same individuals are treated differently at different times and we compare their scores as a 
result of different experimental treatments. Let us iMustrate this design with an example. Suppose 
using the within-subjects design, the experimenter wants lo Investigate the effect of a drug upon 
retention of a verbal task. He may test a group of subjects under two conditions, namely, in 
normal condition when the group has not taken a drug and a condition when it is under the 
influence of a drug, Thus each subject of the group has a pair of measures on the dependent 
variable, Subsequently, we may test the Signiticance ot the difference between means through 
appropriate statistical tests. Any change in the dependent variable measures may then be 
ascribed to the application of the drug. The above experiment can also be conducted through the 
between-subjects design where the experimenter is required to have two equivalent groups of 
subjects and he may assign one group to the experimental condition where the subjects will be 
under the influence of a drug and another group to the control Condition where the subjects will 
be in a normal state, that is, without any influence of a drug. Subsequently, the mean scores 
obtained on the dependent variable measure may be compared and their significance of 
difference may be tested through appropriate statistical tests. The experiment based upon the 
within-subjects design is common in the field of learning, memory and psychophysics. 


(AP egen an 


SMe ens ef the ee peritnwent in 


tigh which er 


; Hire alon¢ Brut ran he rreated in 
eT On Sth tra 


Conditions 


Ment her ore rivky ned 


results in some predictable 
wrth He pencens alt teatyle| 


ca f up “ j 
ay | aSeSS ING hibipee lw pert ia pants on the matching 


JUS" mgt hing SOMe4blmes, OCC UT. 


8 because in the randomized-groups 
design two OF more values ot the 


The within-subjects design may, for convenience, be divided into two parts: a design having 
wo conditions and many subjects, and a design having more than two conditions and many 
subjects. In the former case, the same group is tested under two conditions only. The mean 
difference under these two conditions may be tested through the matched ¢ test or sandier's A 
lest. The experiment described above illustrates the within-groups design having two conditions. 
The within-groups design may also be extended to a situation where more than two experimental 
conditions are utilized. The experiments based upon such a design have been frequently 
reported in the field of proactive inhibition. The experiment may be, here, designed as having 


‘A 


| conditions and the same group of subjects may be tested ung 
ndition | the group may be required to learn zero list; 
hree lists; in condition III four lists; in condition |v five 
seven lists. At the end of each condition, the group may be required to learn one "Ong 
time interval of one hour since learning this new list in each condition, the groy list..." 
recall items from this new ie Is expectes that the number of prior lists oor | MAY be Malt 
interfere with the retention of the new list. Greenberg & Underwood (1 950) ia will i, | 
conducted several experiments using such within-groups designs on proactive C 
have shown that the greater the number of prior lists learned, the highves: ve inhibit 
proactive inhibition, IS the an 


five ex perimenta 
conditions. In co 
uired to learnt 


' er 
IN Conds. ~ Ofm.. 
; Hon». Ney 


fed 






q 


PROBLEM OF CONTROLLING SEQUENCE EFFECTS IN WITHIN-SUBJECTs 
} Des) 


As mentioned earlier, within-subjects design is one where each subject partic) 7 
conditions of experiment. For such design, participating in one ¢ ondition Pates In all 
subjects tend to behave in other conditions. As a consequence, sequence . gn afte 
occur. Sequence effects include both progressive effects and carryover eftec - order effets, 
A within-subjects design becomes necessary when subjects are scarce . 
eliminates creating equivalent groups that occur with between-subjects Ani : 
Now the problem is how to control sequence effects in w 
effects include progressive effects and carryover effects. In an experiment using wrk, em 
design, trial No. 1 might affect the subject in someway so that pellonnasion « Pied . 
steadily improved. Sometimes, repeated trials may also produce gradual fati : On trial 2 May by 
the performance may decline from trial 1 to trial 2. These two effects are ‘list ies. Man 
since it is assumed that performance changes progressively fiom trial a 6 I. Likewise il 
experiment having two basic conditions such as A and B, experiencing po ewise, na 
migntaitect the person differently than experiencing B before A, “INE Condition A before § 
ion a inna sequence effects is called as counterbalancing where mor 
sr Mere are two general categories of counterbalancin 
8 


upon whether participants or subjec | 
3 jects are tested in each experimental condition | i 
. | ee al condition 
are tested more than once per condition. A discussion follows. just one timeg 


sides, Such desi 


Sipns, 
ithin-subjects design : 





1. When participants are tested once per condition 


In some experi aki | 
cocaine’ Whee ines are tested in each of the conditions but tested only once pet 
one easy solution to the : oh “ipanis are tested once per condition in a within-subjects desig | 
every possible s “ih eee of sequence effects is to use complete counterbalancing, wher 
determined by aoe i used at least once. The total number of sequences needed § 
calculation of Me eo ‘ is the number of conditions and ‘!’ means mathematicd 

| : pe e, ) if 7 ‘a 
ix possible sequences that can Me eee mn *Periment if there are three conditions, there Gai 
A! = 3!=3 x2x1=6 


This div. , 
€ SIX Sequence in a study with conditions A, B and C would be: 


ABC 


BAC 
ACB CARB 
BCA CBA 


Research Desion 537 


Partial counterbalancing is another Way 


to § 
subjects are tested once per condition in withi olve the problem of sequence effects when 


n-subjects desi 3 
{ nce is us sign. Whenever a subset of the total 
number of seque ed, it is referred to aS partial counterbalancing. In such a situation, it is 


possible to neutralize the sequence effects by the followi ng rules: 
(a) Every value of independent variable (or condition 
times at each ordinal position, 
(b) Every value of the independent variable 
value an equal number of times, 


Suppose that the experimenter has used four conditions {or independent values) such as 


A,B, Cand D which would require 24 different sequences (4 x3 x2 x1] . ; 
sequences, the following four orders may be chosen to a x1). Out of these 24 different 


fy the above two-point rules: 
ABDC 
BCDA 
CDBA 
DACB 
in this selection, we find that each condition occurs once at each ordinal position and each 


condition precedes and follows every other an equal number of time. summing up data for all 
subjects will neutralize the differential impact of sequential effects, | 


) should occur the same number of 


(or condition) precedes and follows every other 


9. When participants are tested more than once per condition 


There are experiments where it becomes necessary for subjects to experience each condition 
more than one time, It often occurs in experiments in the field of sensation and perception. 
Suppose in an experiment on Muller-Lyer illusion, the experimenter is interested in measuring the 
extent of such illusion if the lines are presented in four different conditions: A, B, C, D. 


i 
| | . ) 
\ 
4 
A= Horizontal B = 45° to the left C = 45° to the right D = Vertical 


Figure 21.10 Set of four Mullet-Lyer illusions: horizantal, 45° to the left, 45° to the right and vertical. 


The sequence effects occurring in this experiment may be controlled by reverse 
counterbalancing and block randomization. \n reverse conditioning the experimenter simply 
presents the conditions in one order, then presents them again in reverse order. In the experiment 
on illusions, the order of the reverse conditioning would be A-B-C-D-D-C-B-A. If the 
experimenter wants to have the subjects perform the task more than twice per condition, this 
sequence would be repeated as many times as necessary: One problem with reverse 
counterbalancing is that the participants can predict what is coming next. 

Another way to present a sequence of conditions when each condition Is presented more 
than once js that of block randomization. The procedure is also utilized in the context of how to 
assign subjects randomly to groups in between-group experiment. When block randomization is 
used in within-groups design, it is then ensured that every condition occurs once betore any 
condition is repeated a second time. Within each block, the order of conditions is randomized 
and this strategy eliminates the possibility that the subjects can predict which aegnne will 7 
coming as it happens in reverse conditioning. In the above experiment on | a the 
Participants would encounter all four conditions in a randomized order, then all four again but in 
a block with a new randomized order, and so on for as many blocks of four as needed. A block 

fandomization technique may produce either of the following two sequences: 





CABDABDC 
BCDACADB 


ancing techniques help in reducing sequence 


These ewoceaeenn terbalancing techn ique assumes that the ae { Ohh. 

do so imperfect!) Sa slwuetys be true, especially for carryover effects. Such i, | of a 
sms sens that counterbalancing might not cure. SUPPOse, for Cxa Ove) Na 
may create si required to learn two mazes—Maze A and Maze B. Learning of Ple, hat | 
pes bi eciate about learning mazes in general but the same insight does Maze 4 ] 
ere of 5 If this really happens then in the sequence Maze A-» Maze B ae: res BM 
solving wes of positive transfer that carries over to the learning maze B. On Ate 
mace maze B first will not produce any such positive transfer to learnin of BN s 
ina. 


sequence B > A. It means that the two sequences would produce what Poulton (1983) ne 
as asymmetric transfer. This obviously means that one sequence will produce , Sal, 
outcome that is not matched by the counterbalance sequence. If the experimen icy, 
asymmetric transfer, it would be wise to switch to a between-subjects design if Siiche 

feasible. ch 





alge 
COMPARISON OF BETWEEN-SUBJECTS DESIGN AND WITHIN-SUBJECT¢ D 
As we have discussed in detail both the within-subjects design as wel| 
design, we are in a position to make a comparative evaluation of these two designs. T-SUbjer 
1. In a within-subjects design the same group is used under all experimental 
whereas in a between-subjects design, a s€paraté group serves under ea 
condition. Obviously, then, there is the advantage of economy of subjects 
design. Since each subject participates in all experimental conditions 
design, the data for all subjects completing the experiment are available jn 
2. Since the experimenter repeats 
within-groups design, the factor 


ESic 
as the betwee, a 


Ach experingn 
IN a Withinsube. 
in the withingee 
Pertectly valid form 
Sroup of Subjects jn , 
1S automatically Controlled, Th 
Variance, But in the 
s different BTOUps of 
unfortunate effect of 





the measures on the same 
of individual difference 


__ 3. Ifthe experimental situation 
fair amount of time and 
psychophysiological 
design because the s 
tested under different 
conditions. For ex 


is such in which Preparation for the experiment requis 
_ Patience (mostly true in the case of neuropsychological a 
experiments), the within-groups design is preferred to the between-sroyn. 
ame group of subjects with all its initial Preparation, can more easil : | 
| conditions of the experiment than different £roups of subjects in different 
ample, suppose the problem of the experiment requires that electrodes be fitted 


fe. ‘ = ny 
mic Mat the same animals May work under all the experimental 
Ecessary preparation can be minimized. 







ed 


presentation of conditions js likely to have? 
variable and there are obvious practice and fatigue effects, 
be preferred. [n other words, when the experimenter expec | 
transfer refers t a pager haaey — | 
when the S to the fact that the experimental treatmel 
; | from condition A to B than when he 
er (Poulton 1 982). Poulton is one of the 





°ymMmetric transfer. Let us illustrate this | 


Research Jes $49 

“han example, Suppose the EXPerimerts 
knowledge of results (KR) upon a 
a : 


ahin-groups design. Further suppose 
Wit 


FS Conde hing 


On Expenment to examine the effect 
VENSOTY-INOLOE task 


that iis cutther sai Such d5 aA line-drawing task USING, a 
: ; » USES the ABBA order. if he Lnhie ‘ i Ounterbalane e the elec ol prac lice and 
fatigue, he us luna eauKRUEa ee 2GUCeS first the KR Condition into trials followed by the 
without knowle' aah ae ) in the next 2() trials ane azaIn the KR condition in next 10 
trials, Tee be dup ithice ‘dition tee on the Part of subjects, which may make the results 
unreliable. In the ts Nth cig © Subjects are Biven the knowledge of the actual length drawn 
hy them each time. oe d practice Of 10 trials they May develop some skills and special cues, 
which may help in a line-drawing task, Subsequently, when the WKR condition is ictal wolves 
they are not given the knowledge of the length drawn, they are likely to profit by their previous 
skills a eB kn aa nave the effect of leading to a conclusion that there is no 
difference between the KR condition and the WKR condition. Obviously, then, this would be an 
ge of results. If instead, the 


unreliable conclusion undermining the importance of the knowled 

between-groups design is used in which One group is given KR and another Broup is WKR, it is 
likely that the performance of these two sfoups would diifer significant ly. Probably, the group 
receiving the KK treatment would be more accurate in line-drawing task than the group receiving 
the WKR treatment. : 3 

Although between-subjects desi 

choosing a within-subjects design. 

(i) When only a smaller number 
preferred. 

Gi) Within-subjects designs are especially useful when no good matching tasks exist with 
which to equate the several independent Broups of subjects. 

(iii) Within-subjects designs possess com 
efficiency (Posner 1973), 

(iv) Within-subjects designs are especially needed for those areas of psychological research 
which require studying changes in subjects’ behaviour over time. Panel or longitudinal 
studies and areas like psychophysics and scaling are most appropriate for such designs. 

However, within-groups designs are not considered appropriate to study subject variables 
like age, sex and the like because such subject characteristics can’t be successtully balanced, 


EXPERIMENTAL DESIGN 
CLASSIFICATION 


Apart from the experimental designs (excluding factorial design) discussed above, Campbell & 
Stanley (1963) have discussed 16 designs ranging from the poorest to very strong ones, which 
have proved very useful in psychological and educational researches. It is not possible to discuss 
all the 16 designs here. However, some of the important designs will be selected and discussed. 
For a detailed discussion readers are referred to Campbell & Stanley (1963). 

In discussing their 16 experimental designs, Campbell & Stanley have used 
which a reader is expected to be acquainted. 

KR: 
, 


BMS are Considered more powertul, there are four reasons of 


of subjects are available. within-subjects designs are 


paratively a greater degree of convenience or 


BASED UPON THE CAMPBELL AND STANLEY 


some symbols 
with 
Random selection of subjects or random assignment of treatment to experimental groups 
Treatment or experimental variable which is manipulated. When treatments are 
compared, they are labelled as X,, Xj, X,, and so on, 
Observation or measurement or test. Where there is more than one O, an arbitrary 
subscript O,,O,,O,, and so on, is used. . 
When one or more X and O occur in the same row, they indicate that these are being applie 
'o the same persons, The left-to-right dimensions in which X and O occur, indicate tempor: 
order and when X and O occur in vertical order to each other, they indicate that these two ar 
simultaneous. The parallel rows of symbols unseparated by a dashed line indicate that grour 


C): 


doural Scwences 
yd A earcl Methods 1" Behar 

= pcs af thé ae | 
0 Tests syeasuren™ when sep arated by dashed lines, they jn dj 
| ‘zation and WIS eas 
, randomizatic gibi 
have been equated DY oe ated bY randomization. 
aw : “ua \ 


a nat been 
groups have nat bee ne (NONDESIGNS) 


E DE experimental desi 
= ERIMENTAL t qualify for the exper! esigns bec 
iil designs which actually see ile of a control group. Such designs do et 


' are three or the equiva ae ; : Not Ry 
There -ovide a contro! group lidity, These designs are called pre-experimental design. be | 
do not Pp < of internal vall@nty. pian experimental design. Because these “Cdlgg, 


hreat 
adequately ¢ wbilian 
Ay incorporate the pe ae — 
inadequate in themselves, 


three pre-ExPe rimental designs- 


One-Shot Case Study 
The one-shot case study may 


asic elemen 


also sometimes referred to as nondesigns. The follg BMS ay, | 
a Pee = te 


wing 


be diagrammed as indicated below. 
XO 
_ : the treatment X is given to a single | 

As its name legge. in aha apie japon the effects of treatment upon ce areal a 

subsequently, S oatatit lienteationd. One is that it does not provide a control group and ance ; 
iia H phinte ie any information regarding the members who are given the treatrp | 
sie of these limitations, there is little justification for concluding that X caused O. Let us take an 
example. Suppose the principal of a college intr oduces the PISetice oe monetary feWa 
(X) to students who regularly attend their classes. After this practice has been in operation for 
year, the principal observes that the students attend their classes regularly and disruptive 
activities in the classroom are minimized (©). On this basis the principal concludes that with the 
practice of giving monetary reward, the absenteeism and the disruptive activities in the 
classroom are reduced. This conclusion is, however, dubious because the principal does not 
know (a) whether or not factors other than monetary reward have contributed to the Observed 
change in behaviour; and (b) whether there was a real change in the observed behaviour relative 
to their past behaviour. 


One-Group Pretest-Posttest Design 


This design is an improvement over the above design because the effects of treatment (X) ar 
judged by making a comparison between pretest and posttest scores. However, no control Broup 
is used in this design. The design may be diagrammed as shown below. 


0; x 0; 


eae wm ac the principal of a college wants to study the effect of movies in 
ae we os e ae group of students. He first obtains some initial measures of attitude (0)) 
change in altitude Sih “i a wes ee asked lo see the film (X), which intends to bring a 
DWE. ccc Mifinmi re a measure of attitude change may be obtained (O.). This design 
mortality), which are likeshy t5 weet two extraneous variables (that is, selection and experimental 
indicate the inithal state Bb en ro the internal validity of the experiment. The pretest scores 
experimental mortality of he a Sore subjects and the posttest scores indicate the state of the 

y of the subjects. However, the extraneous variables or the sources of 


internal invalidity like his 
: wstory, ; . : 
this design, "y Maturation, testing and instrumentation are not controlled by 


Settins — 
: ce Group Comparison (or Intact-group Comparison) 
nN this design, two ErOUDS are taken. One sa 


an 
d another BfOUp does not experience 


: BrOup (O,) experiences the ex 
BOups are compared. The desi enma 


the experime | 
es ital treatment (O 


perimental treatment (x) 


2). Subsequently, these two 
d ds shown below. : 


a i 





Research Desipn 541 


in this design, a control group ( the group receiving no treatment) is used as a source et 
‘ean for the treatment-receiving group or the experimental group. Because a OMIT 

compar? used, sources of internal invalidity like history, testing and instrumentation are 
group led The factor of statistical regression is also controlled. One of the main demerits of this 
contre eat subjects of the control and the experimental groups are neither selected at random 
desig" iri! assigned randomly to the groups. Thus two groups are not equivalent (as there are nO 
nor are t Y nd hence the factor of selection is not controlled. As the samples are not 


Thus 


pretest jy drawn, the interaction effects of selection (a source of external invalidity) are also not 
random d The dashed line indicates that the control group (Q,) and the experimental group (Q,) 
ae been equated by randomization. In fact, these three pre-experimental designs are 
ie es of how not to do research if they are alternatives. 

xampp? 
TRUE EXPERIMENTAL DESIGNS 


re three experimental designs, which are called true experimental designs. In these 
There the control group and the experimental groups are formed and their equivalence is 
des|5 ched through randomization. These designs are called true experimental designs because 
establis factors or variables contributing to internal invalidity are controlled. Although true 
al a ental designs are the strongest type of designs, in some situations it is difficult to conduct 
oeriments based upon such designs. Three true experimental designs are presented as 


mentioned below. 
posttest Only, Equivalent-Group Design 
This design is the most effective and useful true experimental design, which minimizes the threats 
to the experimental validity. The design can be diagrammed as shown below. 
R x O" 


R O,; 

In the above design, there are two groups. One group is given the treatment (X), usually called 
the experimental group, and the other group is not given any treatment called the control 
group.The use of a control group automatically controls the two extraneous variables, namely, 
history and maturation. Both groups are formed on the basis of random assignment of the subjects 
and hence, they are equivalent (there is no dashed line). Not only that, subjects of both the groups 
are initially randomly drawn from the population (R). This fact controls for selection and 
experimental mortality. Besides these, in this design no pretest is needed for either group, which 
saves time and money. As both the groups are tested atter the experimental group has received the 
treatment, the most appropriate statistical tests would be those tests, which make a comparison 
between the mean of O, and O,. Thus either t test or ANOVA is used as the appropriate statistical test. 

Let us take an example. Suppose the experimenter, with the help of the table of random 
numbers, selects 50 students out of a total of 500 students. Subsequently, these 50 students are 
randomly assigned to two groups. The experimenter is interested in evaluating the effect of 
punishment over retention of a verbal task. The hypothesis is that punishment enhances the 
retention score. One group is given punishment (X) while learning a task, and another group 
reecelves no such punishment while learning a task. Subsequently, both groups are given the test 
‘ retention. A simple comparison of mean retention scores of the two groups, either through the t 
est or ANOVA, would provide the basis for refuting or accepting the hypothesis. 


Pretest-Posttest Control Group Design 
This design is simil 
Pretest for both 
may be diagr 


ar to the previous one except for the fact that it also makes a provision for 
Groups before experimental and control treatments are administered. The design 
ammed as shown below. 

RO, X QO; 

RO, QO, 


i havtoural Sciences 
oarch Meth ed's 10 Bt | i 
permnertes andl Kesee Research Design 54 
Paehe /feristt : E d : 
gaz Tests that the design has two groups. One Broup , 


: r ' 
5 no such treatment. By use of the contro} <M", 


nd Star thi He 


above diagram 
receive Ade ‘ oe ; 
P ¢ internal invalidity like history, maturation a 


m the « 
d another grou 
e sources O 


It is obvious iro 


class the order of teaching may be—map re 
gne clas 
treatment (X) an 


ading, then using compass and then the LL 


m, In another class the order may be using Compass, map reading and then LL system. on 
- hk . ole 6 | | . = = J = x 1 : cCMmer 
trols som ioned to the control group and the Experin, ey yother class the order may be LL system, then compass usage and then map reading. The eee = 
design ~~ ubjects are randomly ass : | mortality, posing threats to internal validin Fup ae examination after each unit and students take a comprehensive examination at the en m4 
regression. AS SUNS perimental! | : | : I ul taker ince the students are randomly asci > classe acher can easily 
pacman like selection and sail above diagram that both groups are given , rete ty the entire term. Since t andomly assigned to three classes, the teac 
ey ej bvious fro | 
ad, It is also 0 
controlled. 


nits in specific order resulted in improved learning. 


QUASI-EXPERIMENTAL DESIGNS 


Pe AN a t presenting u 
ne basic limit: : ne whether or not p 

lement ot pretest, however, Lie eeoededen si eee In this des determin 
posttest: Tne ee the gain on the posttest due to the experie ic Alias Pretest (Called tag? 
there is no contro! aie the internal validity of the experiment. Not only this ther et 
r is 






fn os Hie | of | iment is one that applies an experimental inter retation to results that do not meet 
effect), which may sible sensitization to the treatment, which a ee Might devel | i FN ements ge hoe excesknet te rwnaits eee hie. Sikcatian fc much ine ee 
control over the  alase reducing the external validity of the sas Oh wualich i other Words, icant | al ie wae has some control over the manipulation of independent variables but fails to 
wees wi: ia of testing and treatment (a source of externa a ge tolled, h nT tel the other basic requirement of a true experiment, that is, creating equivalent groups. 
said that ae osttest control group design can be stalistica analyzed by Making. arrange uasi-experiment is basically an attempt to simulate the true experiment and, therefore, 
data in the +e ge patel for the control group and the exponents Ab IN other y, a Thus, : q coe called compromise design. Quasi-experimental designs are partly like true 
a of: ~ O, can be compared with the i oleae a ~ O3 $0 that iteay | = were designs. They control some but not all extraneous variables, which give threats to 
ci abeartained whether or not the eave Nae 8 a pel pe el ea pe BFOUPS. Ifthe | i ibenal validity of the experiment. The quasi-experimental designs provide control sais 
groups are wholly equivalent, the posttest yet : Si ah Eee OMPArEd jg when and whom the measurement is applied but as subjects are not randomly assigned to the 
ascertaining the impact of treatment upon the groups. experimental and the control group, the equivalence of the froups is not maintained, and tus; i 
Stang roneuGrenpiDenien leaves some uncontrolled threats for validity of the experiment. “ ee i concent 
Th : on four-group design developed by Soloman (1949) is really a combination of the ty. manipulate the independent variable. Such variables are called ‘quasi-independen tne 
e Solomon four ghrp eae | ly, the posttest-only design and pretectnn.. d studies that employ them are called quasi-experiments where we lay out the design an 
equivalent-groups design ueearnee mpavey name | i threste ofthe exter : Wa ves wee are the scores between the different conditions as in a true experiment, but we only appear 
See ee ete ata controls si ie two experi “ : i oh! inthe se aalivanideter the variable. In a quasi-experiment, the researcher cannot randomly assign subjects 
design, subjects are randomly assigned i agin a aan ws t 2 aes eee Broups. Only ° a exposed to a particular condition. Instead, subjects are assigned to a particular condition 
one experimental group and one control group receive a pretest. ae Prous aes): | aS: they already qualify for that condition due to some inherent characteristics. Age, sex, 
Rennes Unsheaelint sna eelagranimen as shown below, “ate background experiences Or personality characteristics are some of the examples of 
KO, X O, quasi-independent variables. Such designs are better than pre-experimental designs, eee 
R O, ©, they are not as adequate as the true experimental designs. In a way, then, the quasi-experimenta 
R x 


designs are somewhere in between the pre-experimental designs and the true-experimental 
designs. Campbell & Stanley (1963) have discussed several types (excluding the variants of some 
types) of quasi-experimental designs, some of which are presented below. 


R O, 
It is clear from this diagram that in this design, 


experimenter. As a matter of fact, in this design, 
and hence, the advantages of replicati 


four groups are randomly set by the i 
two simultaneous experiments are conducted Time-Series Design 
on are available here, The effect of X (treatment) is 
replicated in four ways: 0, > O1;O, > O,;O, > O, and O, > O,. This design makes:it possible 
to evaluate the main effects of testing as well as the reactive effect (or interaction) of testing, 
maturation and histo 


tion an ry, thus increasing the external validity or generalizability. The factorial 
analysis of variance can be used as the appropriate statistica 


from the methodological as well as the Statistical point of v 
two true experimental designs. 


Latin Square Design 
This experimental design is 
Participants receive multip| 


Thus when a researcher js 
time orders affe 


sometimes it happens that a control group or a comparison group cannot be included in an 
experiment because of the situation in which the experiment is being conducted. Still, the 
experimenter wants to have a design, which may exercise a better control over the extraneous 


variables. The time-series design is one such design, which can be followed in the situation 
| test. Because the design is complex described above. This design can be diagrammed as indicated below. 
lew, itis less preferred to the above 0,0,0;0, X 00.0.0, 


IL is obvious from the above diagram that a series of pretests are given to the Broup. 
Subsequently, the treatment (X) is given and a series of posttests are given to the same group. This 
er the order or sequence in which subjectsor design differs from the single group pretest-posttest design because instead of giving a single 


=a ‘ete ‘ seri ; ests and posttests are given. The extraneous variables like 
ent has an effect upon the dependent variable. pretest and posttest, a series of pretests and post B 
sted in how several 


used to examine wheth 


E versions of the treatm 
intere 


np : | Maturatic acti -alacti cperimental mortality are well controlled. However, a 
ct a depend ; treatments given in different sequences of “ration, testing, selection and experiment: ease ae d daihttothe different kinds 
reacher of geograph ily Ent variable, a Latin square design is used. For example suppose 4 | variable like history is not controlled because the subjects are exposed daily ener 
reading and | braphy has three fundamental units to teach students: ho nen ae 5, map of stimulations beyond those under the control of the experimenter. In analyzing the results of the 
. od n ongitude- latitude (LL) system, Th Students: how to use a compass, Mep 
ay Interested jn knowing akin - NESE unit 


ae ae , | re avoided. However, 
S May be taught in any order but the teache Uime-series design, simple statistical comparisons between O, ane ee are avo 
h order Proves to be most beneficial to students for learning Comparisons among all other pairs of adjacent points are recommended. 
7 


- 
~ SS 


Reunatent Time Sompies slate sere Seni an tension Of the time setes tase 


& 


| or Be ewermental variable. Like me Se 
peodkcter ov Fe ed 2 single group S used and the group | 


renrens coors Seman war. The design may be Gagrammed as 
ee YQ (2. 1.0. XD, 


chr mm th above Saga tn te equal esas dese 
ts cer en for 2 Sagi Ue. S troduced and “8 
: oly ’ a8 —_ 2 : et ei 
apes" 5 of the Temesenes design, S well controlled dy PFESENting x “ 
Sak Aes UU | son, selection. and : Shaky, 
of, Beg Ee, RES, pe TMeNal Oral 
TSS, ~opineiied. Thes tis design peovidies not onl an extennon but also an : ie 
 cieaiidaiiiaian Vedeion can be iustrated with the tollowing example | 
maaan the experimenter wants % stud) the ert or “Wwing ae 
grosabonzization fim on the atkude towards nationalization ROF a STOUD Of stage ne 
The experimenter shows the Bim to 2 group of students i A,). Aiter that, he a a 
masane of aMiede Chanee towards nahonalization (CQ). Aker a my Gays, the ou. a 
Srey Gacusees the general eastuhnes oF natwonalizatron with them in their class (1X | “2 
test of ther stttude can be made (O.. Amer 3 tew dans. the ernegy witnesses the ame be 
THE SETS of amuce may be obtained (C..|. Following this, the experimenter dig- Seip 
eer amecl of natonsiranon (\,). Their attitude towards nahonalization ix again tess 
10. The satstcal comparison of O. and O, with O, and O, helps the e a. 





















a J 





companing Q, and O,wO4 
Atthoagh the aquinglent time semples Gesign iS an improvement over the ti nes de 





a them asi Bac, 

Cannot randomly assign 

; 4 their equi alence is SUSpRTe 

. Oneg Hvalent control group de 
bel 


y. 








Seu s eg | Sa4 
ey cesar oe - es _ Pat ESE. AS the expenmenne & AX alowed to 
> yerdorly ee" oe the Expenmencai 2G The corm! VOROan thee agungkence a 
aot granted Thies creates dificult 7 Controlling a vanahie tke stlecton. However. thes dcUn 
-gn be overcome Dy Cenpaning the intact SPORES On Eretess ha Oo Cand O. Fiow the 
aumtacton OF the SNperemeter on the crtenan os AN BENS The untact emus can Ke 
compared 08 ANY CONTE! Variables relevant io SeeCOOM aad potensiath TaN 8 the treatment 
such 23 2gF Se eaertienice, WOR SOOO . SARS eh, Thy SSC anaes Goreatk os 
CNTENTAS the mean gain ee Oh the Treatment STR A. ~ 030 the mean Gam Save ov the 
QONTEATTNT SOU (IO, - 0,3 The RoMequnvaleet Conny Sup desgn Aowever shoukd mat be 
yaad where one S the intact SOMES Comsat: OP VOUS and the Wet Monealumeern. This np 
because the volunteers are Giferent from nonvwtunteers and a COMPMMEON oF The daw does Nox 
provide for control of 8 variable like selection. 

Consdet an example to illustrate this GENN Suppose the eXpeMeNENRer Wants to know the 
efect OF 2 week's Training intended 8 IMPANE the Maprdrawing dull among snadeats at a 
geography Class, For this PEPOSe. the principal of a acho DROVES the experimenter With awe 
Clases geography, each CONSIS NS OF 2D students. The brmopal does nay pes racwstiision 
at these WU groupe In any wav and AQUOS the expenmenter WO Rancle them as intact STOUES. 
Fothow ing the request ot the princigal, the experiments: GOORISS 2 use the nopeaquiv gent control 
group. Which of these hwo groups would act as 3 COMMON group aad as an o\perimental SA 
was decided rindomh by flipping a coin by the experimenter. The ewperimenter howeves 
ascertained the equivalence of these nwo intact SOURS OT ARE FOX presence of physical 
disorders and so On, and was satistied that thes SOUPS NIK be regarded as equivalent in all 
these respects Both the groups were adnvinistored the Map rawing test as pretest measuces. 
Subsequently, the experimental STOR Was given Faiming im map dean WE work UV and the 
como! group was Rot given any such training. After that both grOUES Were Teadeniternd the 
Magra we 1, and QO,). The mean Can Scores \POSttes! minus Dretest! of the nua RTOS Were 
compared and it was found that the difference benveen O. ~ OL the difierence benvect: 
OL — O would be positive and statretically Seniicant, 

Counterbalanced Design 

in counterbalanced designs, the exApenmental Control is achieved by TOR, aie 
expenimental traatments. Such designs are called Crmaner Wage (Coxhran & Con 1887 Con 
VS58)) Switchover desegns (Rempthome. 195) wd Rosman ewenmens by Moc all! ses 
The name counterbalanced GESIBAS Was howeve: aver Dy Undenwad (raga) §, 
countertulancing the treatments Renerally the Latin-square MaeMeN I Which ec 
TRIMER Appears ance and onh once ach column and in each nw ik hh 
Counter baiancing design in which jour Treatments have been rancomh SIVER TO RAG ETS oO 
TOUr different OOCasions is diagrammed below 


GhA VOLO Lo \,0 
GiB VO YO Ww Ao 
GEC VO WO AO Ww 
GeD V0 4.0 149 (oO 


Variables like history, maturation testing. iostumentationy, sedectiog Pegnesshws 
experimental mortalin 


(posing threats to intemal validin’ are well CONIAT by 
Counterbalanced designs. One source ab extemal in ality howeaee cree inte 
“OUNETha| 


anced desien that is multiple \ intertomnee lowers the mM aliratin 


nd Research Methods in Behavioural Sciences | 
i yeasurements a 
546 Tests, : 


ePretest-Posttest DEE" siyeuited to thoes o 
le pretest-posttest design Is specie Y SUNS te CHOSE Sigg 
ple p ments to all subjects at a time. Hence, he jg forces “hic 
d administer the treatment. Then, again “unin sample IS taken ane 
sample an ted. Consider 2 situation in which there are 2,000 Persons who aren eh i 
treatment IS Omer ti more than 200 persons ata ume. In such a Situation, ba" 
but the nga run continuously each time with a new set of persons and at the on the ig 
st sal “casi withhold treatment from any person. Consequently, he Te tin 
Se eval conditions and nontraining conditions. As such, no true ©Xperiment) 
Pe applied here. To deal with such a situation, a sepaiate-samale PFelest-posttag yee i 
be adopted. For this purpose, ie experimenter decides to take a ON€-Broup Pretesy 
design and repeat it as shown below. 


The separate-s2 


Ons i 


sneer ia 
lf the one-group pretest-posttest design 1s used without repetition, one ma 5 
variable like history is not controlled because the subjects may be influenced by sey ba, 
events occurring simultaneously with the treatment (X) that might have produced QO. But 
that design is repeated thus constituting a separate-sample pretest-posttest design, the vat 
like history is controlled because it Is less likely that the same events would haye oa” 
simultaneously to X on both administrations. IO, exceeds O, and O, exceeds O,, we h OCCU 
evidence for concluding this. Despite this, the separate-sample pretest-posttest design f 
control the three factors of internal invalidity such as testing effects, maturation and intera i, 


’ 
, 
(a 





selection and maturation, 


Patched-up Design 


In a patched-up design the experimenter starts with an inadequate design and then, adds sone 
features so that recurrent factors producing invalidity may be maximally controlled, 7, 
patched-up design shown below is a combination of two different pre-experimental desipy 


- anne ; 


re 












These two designs are inadequate in themselves because they fail to control many extranens 
variables which threaten the internal validity. For example, the former design fails to cont 


latter design fails to control selection but controls maturation and history. Hence, a patched 
design may be built on the basis of these two designs so that their mutual limitations may® 
overcome, The resulting patched-up design may be diagrammed as shown below. 
Class A xX O, 
Class B O, X O, 


In the above diagram, the O; and O, comparison represents the one-group pret na 
comparison and controls variables like selection to a greater extent but fails to contro! historyat 
maturation. Likewise, an O, and QO, comparison is an intact-group comparison and cont 
maturation and history but fails to control selection. If O, versus O, and O, versus: 
Comparisons can be regarded as equal then history, maturation and selection are automalié 


controlled so that the outcome cannot be accounted for in terms of these extraneous variables 

A r : = ; . = oe 

a nn illustration of a patched-up design given above reveals some interesting sad 
* design. In a patched-up design every subject gets the treatment and therelorer™ 


Research Design 547 
rimenter cannot withhold treatment by 


ex assignin 60 
pe menter can, however, control when and to. g some sub 


jects to the control group. The 

| [ mn a . 
expe" -ourse of experimentati ; o whom the treatment Is given at a particular time 
during the co P ton. The essence of the patched- 3 


separate sample pretest-posttest design, 
3re co mpa red. 

Apart from the above quasi-design, there are also other quasi- 
have been widely studied. One very widely studied quasi- ne 
the passage of time. The field of developmental psychology is built around a e, focusing on how 
it relates to VarlOUs kinds of development in social, emotional and cassie lchavtoa Passage 
of time is also widely studied in different other settings where we compare eX erienoed-with 
inexperienced workers or study changes in memory abilities with advancement in age. These are 
quasi-independent variables because the experimenter cannot randomly assign people to be of a 
certain age or to experience only a certain amount of work, There are three general designs for 
studying the passage of time: /ongitudinal design, cross-sectional design and cohort design 
These may be discussed as under. 7 


3 ‘eatin up design is that groups, like a 
are tested in sequence and subsequently, they 
independent variables that 
independent variable in psychology is 


Longitudinal Design 


in longitudinal design, the researcher usually measures a group of subjects in arder to abserve the 
effect of passage of time. In fact, such designs are confounded by extraneous events that occur 
during the study and they may not generalize over time. The researcher may like to study 
vocabulary development in a group of female children by testing them yearly from ages five to 
ten. Adiagram showing the plan for such study has been presented in Table 21.19, 


Table 21.19 Diagram of longitudinal study showing repeated observations of each subject at 
different ages 





Age in years | 








In Table 21.19, X and X show the score and mean respectively. Adding vertically, any 
differences between the mean scores for the conditions reflect changes in vocabulary as a 
TUNCHION Of age. 

Longitudinal study can also be done over a shorter period of time. For example, Nelson and 
Sutton (1990) conducted one study in which they examined white-collar workers over a 
nine-month period to determine how they coped with work-related stress. 

Longitudinal study has some advantages and disadvantages. Important advantages are as 
lollows. 

(i) This keeps subject variables reasonably constant between the conditions. For example, 
in the above study, planned and shown in Table 21.19, the researcher is able to keep 
variables like genetic make-up, their parents, and so on, constant because the same 
children as they age, are being observed. 

(ii) Such a study avoids problems associated with sample nonequivalence. . | 

(iit) It permits the formulation ot cause-and-effect statements with more certainty than other 

research designs. 


anand SeLOnCeN 
jon Pheer 8 
hh pfetlvenl 


j awn! Meer hk fain a 
gareed 
pol sannurernre J th wy perrmerits weil pratlerreys Al :; 
aye aca portray grcrw rhe mayen jisachvantage tA tr ee lomal Wiehy are as iprwher 
hy (hewlf, * ' * 4 4 J 4 
ey hi a qe hia disadvantage i Gage Tr il atiarhy the pes ma onvfie ate (hue fps ti an | # harp labirey fiiae e within tite arg ale 
trae the follow lh {bry an ex iraneous Vattolile gd (i 
anal dene” na nerenitly | ontounded by any Mil Ne ‘uf 
eitiaa™ rereereey ' M see 
| iP wlinal shialy pu if aire period el ality ry, gift? } , cau tional sucky, ashy 1 7A the carne + birt len sf aye brit cite perl 
a) Leng eo HiTig the em sneralize adequately te tu fi) Ws \ beer ¢ ; r 
. i t4 ‘ ajpmrrennls ; ; tidy may wil ger alll t | ] Vitae Nhey " ' ri il opie ane lurrigvet tenet wer weer!) aA OPP ay if ‘1 ity rend eal ef aney ¢ " arye » & “ant valby 
my livogt & ! — Wey An j ’ ; peeve! " F : | Fa 
| ritudine | Anhalt © Piri ping yy Wie, _gtuptel . 
a very lengl!y whe jculture are pogulatly and CONSTANT CMM RIO with litte ye mate ney wate with thee prerathy spaant 
a 7 mt cgi al ' 
bers aul thw wt sea igen lai ks im poral wanliclily 4 thie Gu by al udy reepilen 1 the ¢ cot irigity rf derielequrnaerit Miterrumima nityle imwdivithual 
ann aes , 


awa no | 
rdinal wtucly 
arnyvcrvet | 
and for dropouts 6 


| ris, SULE msi ress) WOT, m fii! : 
) ~ repealed MEASUEes, AY we -snetienal tudly:comearabiley et tee eerie éx bens an ; . ‘seus 
requirren vets and the results May bie Y ‘ily, fivy de pons seg f " Wty Ot Thee ge aite, beng studied 1s aha uncetta 


we 
fiects or by respon Vil can never bee saree that the tepertiod age rolater) diflerors m het ween uryerts ate 


cance long! ! 
f 
juring the study rhe sesearche 


i antoundtl low | 


ony 


) . product of ather difference, between the grrmuce narnety, differences in 
because of death her cannot control the CNViToAMent of the cally net the ie: went f hue t 
| rch design sii alalmel * Ube, ree Jigence diet or soci Ultural environinent. This Comtcaunding is ¢ aller) ¢otwot effect which 
a rea LJ . joe a) a . | | . 4 
Ye jperiogs | naan hen age differences are confounded by differences in sutyect history, Thus, the 
between lestie | nto. an earlier research design and theory. accurs. wht 


lew ks researe her ' AI —— — ; omparabilily ol the Broups can only bie avsumed Sever sl Harielogrnent pays hologals have 
snsive in time and maney. 1 also JCOpardizes all the [Mey com nstrated that this assumption is frequently invalid (Baltes 1968, \Woodrutl and Burren 197 21. 

; ) we cut, den ‘ 

yl research funds give ot de Despite these disadvantages, cross-sectional designs provide an immediate snapshot 
arison of subjects who differ in age of time-related expenences. To the extent the variaus 
, Necien comp across conditions and eliminate cont | oss-sectional design, the 
Crass-Sectic me quasi-experiment in which the researcher ¢ , yg match across conditions ale conioundings in a crosssectic desig 


, . ' ; . 7 i C ie a = T 2 a ~ a can det J +] pe " ks is pH De 


, testing the vocabulary of a group ol S-year olds, another BrOUp a Cohort Design | 
57) provided us with a good illustration of a CTOSS-SECTiong , cohort design the researcher conducts a longitudinal study of several groups, each from a 
development by selecting sixty children ata NUMb¢r of ‘aberéui generation. Suppose for example, the researcher studies vocabulary development in one 


| ee ildre ) five years old in 1990, in another generation of children 
<n 21.20 presents a diagram of Cross-secting ation of children when they are tale a jee 
5, 6, 7 and & years. Table 21.20 presents « i SECU Ong) int when they are of five years in 1995, and in another generation of children beginning 
we 


iy) This study 
fiyryl This rewtart h aesign sxe 


1 Ul ~ ane mone 
expend tures ot tune wn { 


whet a Cron section of ape 
| | | 
é.vear olds, and so on, Templin (195 
ducly. In fact, he investigated language 


1 | 
ayer leveels— 1 { ? 4, 4 ra 


study in which observation of different groups of subjects from ages 5 to 10 with Fespect tp when they are of five years in 2000. This design has been set up and analyzed jn Table 21.21. 
vocabulary development has been shown, Table 21.21 Diagram of a cohort study repeated 


measures over ages 
Table 21.20 Diagram of cross-sectional study showing observation of subjects of differentage genie *6 


groups 












1990 | 
Subjects 9 
3 


| Generation 
| main elect 








19951 
Subjects 9 
} 





2000 


In Table 21.20, X represents vocabulary score and X represents mean of different age groups. Subjects 


Crouse hional design has also SOMe advantages and disadvantapes. The following are the 


Major advantages. Age main 


(i) Cross-sectional study can be conducted rather quickly and easily, Thus it saves a greal ellect 

deal of time, | isc The columns 
lil) Wcosts less than longitudinal stud ? In Table 21.21, Xrepresents vocabulary score and X represents mean oF scores. TNE CO umn: 
Ui) “Suck ‘iiiaapioaeaginmees mi denote the repeated testing of a child at different ages beginning Irom 5 to 10 years, and the ae 
an a sudy demands no continuity on a long-term basis for obtaining cooperation rows represent three respective generations, Collapsing vertically over the three se a ri 
pena) WOES, mean scores (or each age are obtained and the ditterence between the res reer ie 
NV) ouch a study does not require that the data be frozen over long periods until subject main effect of different ages irrespective of the subject's generation P 


reach the desired time in relusting, developmental information, On the other 


Operation provides knowledge about longitudinal or 





7 a 5 an a ii ~ erces 
Mf remena and Research Methods in” Behavioural ocien 
§50 Jests, Meds 5 


pees esearcher gets information about the mai 
: < horizontally, the r ain 
hand, when we add score 


: 7 F age. If no significant mean differences are fe. 
mean of each saree mete ip ane differ on the basis of the sy bject’s mie 
researcher does not a differences between these three groups then cohort effect jc said oa 
However, if sa ae etal results are less likely to generalize to other generations, Likew 
present and the evel P between age and generation. if there is absence'6f significant interacys.: 
there may be interactior tion, it would mean that changes in scores as a function o¥ ss eS 
“ar ma fe els acmnentiie of each generation's own uniqueness and Aa : 
ramet pipet interaction would indicate that type of changes visualized With ape 
depends on which generation the researcher is studying. ; —— 

Cohort design possesses advantages and disadvantages more or less similar to ‘ongitudinal 


design. 


EX POST FACTO DESIGN 
The reader has already been introduced to the ex post facto experiment discussed in Chapter 15. 
In ex post facto research, it can be said that the experimenter, instead of Creating a treatment, 
evaluates the effects of a naturalistically occurring treatment after that treatment has occurred. He 
tries to relate the outcome (or the dependent variable measure) with already occurred treatments. 
In ex post facto research, the researcher gives the treatment not by manipulation but by selection, 
Because the independent variables are handled by selection in ex post facto research, sometimes 
it is difficult to find out the cause-effect relationship between the dependent variable and the 
independent variable. 

There are two common types of ex post facto design, namely, correlational design and 
criterion-group design. 


Correlational Design 


A correlational approach (also known as the psychometric approach) is one in which the 
experimenter collects two or more sets of data from the same group of subjects so that the 
relationship between the two subsequent sets of data can be determined. While the correlations 
are the common statistics which are employed in analysis of these data, other statistics 
commonly regarded as variants of correlational techniques, can also be utilized. The 
correlational design may be diagrammed as follows: 


O, O, 
Suppose the researcher wants to investigate the relationship between the intelligence and 
problem-solving ability of a group of children randomly taken from Class V1. For this purpose: the 
experimenter will administer the measure (or test) of intelligence (O,) and subsequently, a test 
problem-solving ability will be administered (O,). Thus the researcher will have two sets of data 
before him. He may apply appropriate correlational techniques depending upon the nature a 
data. If the obtained correlation coefficient is positive and significant, the researcher pes 
conclude that the higher the intelligence, the greater the ability to solve a problem. In fa 
strong and significant relationship between O, and O, suggests one of three possible meaning? 
1. The variable measured by O, has caused Os, 
2. The variable measured by O, has caused O,. 
3. Athird or unmeasured variable has caused both O; and O). 


But as the experimenter h ; hich 
| : : laS NOt Manipy| : table. it ic difficult to say wm 
above three Interpretations acco poMeaine vatiable, it ie dificult ; 


| c 
weak relationship: betworn << unts for the obtained relationship. Whenever enw 
correlational design, thus, yan’ Oy all, the above three meanings are Te 


J ane 

tablishine © 
Cannot be reg. wen establis 
cause-effect relationship among amas as an adequate design for 


of the 
cus q 


Research Design 551 
Criterion-Group Design 
n the criterion-group des; i ——e 
aoe the bards wane : my narne implies, the experimenter tries to ascertain hat h 

A spa Of condition by contrasting the characteristi een See 
possesses the criterion behaviour with those who do r “teristics of the group which 
diagrammed as shown below. not. The criterion-group design may be 


CO QO '€ 6, 
eee i, re 
OO, oO, O, 


— ndicates the selection of an experience according to the 
A close scruti ny of the design reveals that this is very similar 
perimental design, particularly the static-group comparison 


may, therefore, be used in situations i 
ee naa, = pu , situations where intact groups of 
criterion and noncriterion subjects are available or where subjects with criterion pinecones are 


random) selected irom . larger group of subjects all qualifying on the criterion variable. Let us 
take an example. Suppose the experimenter is interested in knowing the origins of divergent 
thinking among a SOUP Of 50 students randomly selected from Class IX. In other words, the 
purpose of the experimenter is to explore what are the factors, experience, types of personality 
etc., that may cause divergent thinking among the children. As a first step, the experimenter will 
administer a lest Of divergent thinking to the group of students. On the basis of the test scores he 
would identify two criterion groups—one B'oup consisting of those students who score higher on 
the test and another group consisting of those students who score lower on the test. Subsequently, 
the experimenter may try to find out the general environment into which they were reared, the 
attitudes and education of their parents, etc,, by taking a structured interview of their parents. 
Because the information or experiences gained by the experimenter on the basis of the interview 
have not been manipulated in any way, it is difficult for the experimenter to conclude that a 
particular set of experiences has caused the divergent thinking. However, this criterion-group 
study definitely provides some testable hypotheses in terms of the potential causes, which by 


using a quasi-experimental design or true experimental design may be tested in a more 
scientific way. 


co 


In the above diagram, the letter 
criterion fixed by the experimenter. 
to the correlational design and pre-e 
design. The criterion-group design 


STEPS IN EXPERIMENTATION 


When the problem has been selected, the experimenter plans to conduct an experiment. The 
following are the main steps in experimentation. 


|. Label the experiment: As a first step, the experimenter should clearly label the title of the 
Experiment. Not only this, he should also determine the time and location of the experiment. 

2. Review of the literature: The review of the literature is very relevant to the experiment 
and is a very important aspect of the experimental plan. There are four reasons why a review by 
the experimenter is essential. First, it helps the experimenter in formulating an appropriate 
Problem. Second, it also helps the experimenter in discarding any vague notion regarding a 
problem, already existing in his mind so that he can modify that problem to the extent of 
conducting a good experiment. Third, the review of the eT s ms hoa ti 
avoiding unnecessary duplication of a problem already well studied. ve paired — 
selected by the experimenter is such that it has already been well ahrcge Reais ee : 
Mere seems no sense in replicating ine same ave paved ee rimenter in 
Confirming the findings of the earlier experiment. Fourth, se cocontel led Thus ‘he review of 
hanowledging ie ways in witenine seamen — The most aaaution etshineses of review is 
: ai pedis leyportant step in the wal bs ti cs ional researches, the psychological 
Journals and books. For psychological researches and educ 


#stract is also one important such source. 








‘ hes 
1 i j Rest 1 
k ‘ 


Mfeasureme simenter decides to ¢, 


_ natty. the expe | Nducy 
ft enerally, ni woe If that field. By ( iT 
sae 4 al ecienttlic knowle dge formulates a vormlay Ba 
| jecause there ue wiedge The Canal * Problem in 
r | Gea le L Phen f =" a4 : rm te * 
in afielane ie tack of kn sroterably 1 question I¢ H 
oblem, he exp offences | defining the variables: The next Step in 


unambiguoys ra ae hypottests 2 exis Usually, the dependent variable q 
4. Formulation OF Oe the hypone » defined, and subsequently, the tetas 


x te) Oy TOT the : | Se Melty 
experimental plan Is Dg daentifl aa of a sentence (called the hypothesis), 7 i 
are form of expressing hypothesis, Apart ff 
bE Le eae 


guage itcan also be expressed in terms of mathematic, 


.. include a brief history of the oun 
and logica cal app i hang although some researcher’ 
5, Preliminaries Enea | cy sued (Gels ince 
‘cluding their name, age-Fangs: tion, Apparatus !s the commonly ane _PSYCRologicg 
pits ted to these items of informa major functions are : . First, appatatys 
oxerimentaion. in | the experimental noe and second, 
ma net , ioural changes occurring .. 
helps the experimenter as carelul record of the behav | 5 : Urring as 4 
helps the experimenter TP making < There are several types of appa 


te nt variable ae PParatus 
"bation of the independen + ficult to enlist al 
iipasons mune experimentations. Hence, it is very difficu tall apparatys 
which are used in benav! 


he field of learning and transfer of training, the se parine 
However, in the fe? ® 4 stylus maze board are the common Y unlsee | r study; 
board, card-sorting tray, e aldeiver ‘lusion board is frequently C ized. For studyin 
geometrical gaits ie he psychogalvanometer is used. Sometimes, the apparatuses are nol 
prychiogavar f ra pare) calibrated. Hence, the experimenter should be cautious jp 
accurate:and may NC! 


wy kind in the experiment, 

vg apnaratus of any kind in | eee, 

— if acne variables: The experimenter must oe identify those 
oi ener which are likely to influence the sepals don abe prepare a 

ioe these variables so that the changes Occurring LL ers eae 
olan to: | 


| I » beginning of this cha 
ipulati dependent variables. At the beginning 0 pte, 
due only to the manipulation of the indepe ding the contro! of extraneous variables 


ial discussion f ady been made regar 
discussion has already been » regarding | | ; rab 
ie the experimenter is faced with a situation in which the nature of extraneous variables 


is such that they are beyond his control. In this type of situatio a ma ST ee 
relevant variables are equally influencing all the conditions of tT pul , : ation 
is tenable, he can proceed with it, but if it becomes tenuous, it is a ropped. | 

7, Design of the experiment: There are several designs which are used 4 paral 
researches. Some of them have already been discussed in the preceding section, T ese ection of 
the design is made keeping in view the general purpose and hypotheses of the experiment, If ‘ 
purpose of the experiment is to make a simple comparative study between two, groups) 
experimenter may use the two-randomized-groups design. But if the purpose also involves the 
evalution of the effect of the interaction, he may prefer a factorial design. 

8. Selection of subjects: At this step, the experimenter takes a decision regarding the modes 
of selection of subjects and the ways they may form the groups. There are three steps, which are 
directly concerned with selection and assignment of subjects to the different conditions of the 
experiment. At the first step, the subjects are randomly selected from a wider well-defined group, 
The wider group is commonly known as the population and may consist of inanimate objects of 
living organisms. The selected number of subjects is usually called a sample. The experimentet 
usually wishes to generalize the obtained results in regard to the wider group from which the 
ee = drawn, In order to achieve maximum generalization, he tries to oe 
Meniadwles iii Ted he ee ee 
been discussed in Chapter Le Wher re of the techniques of random selection have 2 bi 

er 14, @ subjects are selected from the population, the next step 


552 Tests. 


aN 


experime rt 
Jrart sen 
sho R 


= 3 }) . . a mon lan 


educational status, 


inl proper d 


Rewarch Design 553 


i5 10 randomly assign them to different Rroups. Usually, the number of groups in any experiment 
} jetermined by the number of conditions. Let us suppose that the experimenter wants to 
i yduct an experiment in which there are only Iwo Conditions. Hence, tor this experiment there 
“il be two Broups. Let us further assume that there are 30) subjects available for the purpose. Al 
this second step, these subjects Maly be randomly assigned to two groups. This may easily he 
done with the help ot the table of fandom numbers or with simple flipping of coins, Thus, 30 
subjects are randomly Amen cs two proups, Now, one of the groups 15 t0 be designated as the 
experimental group ant | ve other as the control froup. This ¢ onstitutes the third step, whic h 
should also be done in a random way by flipping a coin, The experimenter may predetermine by 
flipping the coin that head’ will be called the experimental group and ‘tail’ will be called the 
control group. In this Wet Me tee thal the entire process of selection and assignrnent of subjects 
to different conditions Is carried outin a random way so that the experimenter can make a nearly 
verte ct and sound generalization with the minimum possible errors. 

"9. Procedure of the experiment: At this step, the experimenter makes a detailed plan 
regarding the methods of giving treatments to subjects and recording the resulting data in data- 
collection tables. The experimenter also specifies the instructions to the subjects. When the 
experimental procedure is complex and lengthy, the experimenter may expect a few errors to 
creep into the procedure. He should first try oul a pilot experiment, the experimental procedure 
of which might give some indication about those likely errors. 

10. Statistical treatment: The obtained data in the experiment are evaluated by statistical 
tests. There are several types of statistical tests, some of which are appropriate to one type of 
experimental design and data and some are appropriate to another set of data and experimental 
design, Not only this, these statistical tests rest upon certain assumptions. Before applying any 
statistical test, it is reasonable to examine whether or not the obtained data satisfy those 
assumptions. A statistic applied ignoring its assumptions is likely to yield unsatistactory results 
and hence, is not dependable. A detailed discussion of the common statistical tests appears in 
Chapter 23. Needless to say, lack of appropriate statistical tests or a test selected without much 
thinking and insight is likely to invalidate the results of the experiment. 

11. Preparation of the discussion report: After statistically analyzing the data, a discussion 
report is prepared where the experimenter explicitly writes whether or not the hypothesis is 
supported by the obtained results. The experimenter also relates his findings with findings ot 
researchers who have already done experiments relating to the problem. Finally, he concludes 
and summarizes his main findings under a separate heading, 

12. Generalization of the obtained findings: The term ‘generalization’ here means that what 
is rue for the sample is also true for the whole population, The experimenter wants to generalize 
his obtained findings for the whole population from which samples were randomly selected to 
represent it, As a matter of fact, generalization is, to a greater extent, dependent upon two factors. 
First, the factor of specificity by which is meant the extent to which the population has been well 
specified. A nonspecified population is vague and then, generalization may have little meaning 
lor the population. The second is the factor of representativeness of the sample. If the sample has 
been selected in a manner which ensures maximum representativeness of the population, 
generalization is likely to be correct and accurate. 

When the experiment has been conducted, data are analyzed and a conclusion is reached, 
the experimenter then writes up the report of the experiment. Writing a research or experiment 
report requires that the experimenter must have knowledge of a certain format according to 
which reports are prepared, Chapter 24 is entirely devoted to this aspect of research work. In 
general, it can be said that the conventions as laid down in the Publication Manual of the 
American Psychological Association are quite satisfactory ones and can safely be adopted for 
Preparing a report. 








54 Tests, Measuremen 


J 7 Ten ‘ 
is ™ 


i a, 


Sal 


What do you 
_ What is meant by f 


_ Citing relev 
Make a comparative study of the 


Review Ques 


lng 
son? Discuss the basie nage. 

s earch design: Prine; F 
mean by Tse Ples of 
experimental desig al design? Discuss the advantages and disadvantage, 2 
qctofle Co 


factorial design- ve discuss the criteria of a research design. 

ant examples, berween-subjects design and the Within-suyp; 
Sep jerwo-ca wel prefer for studying a social problem? Give Teasone 
design. cn experimental design and quasi-experimental design wih 
Distinguish Der 3 


examples. 


s might a researcher choose in using a within-subjects design? 
6. What reasons mig 


~—d 


Ll. 


12. 


ae » within-subjects design and the incom. 
Sea anit tween the complete wi plete 
Distinguish between 
within-subjects design. — 
H jo subject variables differ from manipulated independent variables in 
How do s ! 
experimental research? 


Discuss the importance of manipulation, holding conditions constant and balancing = 
_ Discuss t 


ensuring the internal validity of an experiment. 


Distinguish between random selection and random assignment in terms of their uges jn 


experimental researches. 


Citing an example, make a comparative study of longitudinal design and cross-sectiong 
design, 


Discuss the basic features of cohort design. 


+ 


22 
QUALITATIVE RESEARCH 


i 
CHAPTER PREVIEW 

e Meaning and Essential Features of Qualitative Research 

e A Qualitative Research Model: Five Components 

e Relevance of Qualitative Research 

e Brief History of Qualitative Research 

e Themes of Qualitative Research 

e Theoretical Perspectives of Qualitative Research 

e Research Design Strategies of Qualitative Research 

e Sampling Techniques of Qualitative Research 

e Data Collection Techniques in Qualitative Research 

e Data Analysis and Interpretation 


Comparison of Methods of Qualitative and Quantitative Data Analysis 
e Combining Qualitative and Quantitative Approaches 
DS 
Qualitative research is an ongoing process of proliferation with new methods and 
approaches appearing and it is being taken up by more and more disciplines apart from 
psychology, education and sociology as a core part of their curriculum. In fact, qualitative 
research has become particularly relevant in the last decades of the twentieth century and at the 
beginning of the twenty-first century. Let us discuss qualitative research in its seminal 
background. As we know, the research methodology can be divided into two major paradigms: 
Logical-positivism or logical-positive paradigm and phenomenological paradigm. Logical- 
positive paradigm is based upon the assumption of natural sciences and utilizes experimental 
research methodologies such as description of scientific method, hypothesis testing, etc. 
Quantitative research is based upon logical-positive paradigm. Qualitative research is based 


upon the phenomenological paradigm, which uses a variety of interpretive research 
methodologies. 


MEANING AND ESSENTIAL FEATURES OF QUALITATIVE RESEARCH 


Confusion exists over just what qualitative research is. There are several reasons behind it. First, 
qualitative research represents a negative connotation, that is, qualitative research is only what 
quantitative research isn’t rather than positively saying what it is. Second, some researchers 
mistakenly think that qualitative research is a unitary approach whereas in reality, qualitative 
research consists of a variety of alternative approaches to the traditional, positivistic research 
commonly found in the literature. 

After acknowledging the above facts regarding the nature of qualitative research, let us 
define it. Qualitative research is an umbrella term which encompasses enormous variety. Flick 
(2009) has defined qualitative research as research where the researcher is, “interested in 
analyzing the subjective meaning or social production of issues, events or practices by collecting 
Non-standardized data and analyzing texts and images rather than numbers and statistics.” 


555 


556 0 Jesl. 
sre diverse in nature than quantitative 
: earch is more diver ( 
ative researc! 


:. a 7 a F CSO are f Ainnitad ‘ “F " rh - - 
tructure at the design and data are not organized ) advany Cra h, \h i dee Mesevat 55" 
> ly her | 


In tact, qualil 


H Te] ear ti the ; nts | J Is ; | 
tative researc] —_—* © research ts also much more e 
qual empiric al work proceeds, Qualitatis OTe ep 


Cleetic (ili) To identify some LINK pier Tee) Phennrmer,. And influences ane perusal heal 
c : | - Thies : I") ’ | a jer aren Ea Le er euriches 
‘po s the iP rTese h. yx Ineones 4 
i tegies and methods as ¢ ompared to quantitative research, Despite these diversiqan® 
at ne : trategies, qualitative research has some recurrent features (Tesch 1999: Sin 
methodological strateyie>: 


(iv) To understand thase Proc eises that lead to Faden 
we, 


" " A] i ty (v) In develop Causal exp 
alive nes Par Pa € | : of 


anAlOns in rare ¢ qe, 
qualit 


ce research is conducted through a prolonged contact with a field 
(i) Qualitative resex i es Fata: eurevelan NW OF aksia OF lik 
situation. These situations are reflective of the e y Persons, gro 


UPS, socier 
es -— : am : ; i 
and organizations. In other words, it is naturalistic preterring to study people. thin ties 


2, What are the contexts in which the study is done? 


lat is, what he thinks is Boing to 


shavi vents, etc. unde Et oa eeplain the 
BS and pen et cs = en - a researcher needs tr urmdlerstand Terr hate thecry 
; | jor FeSe. PSO SOME Personal exnerence fer as 
events in their natural settings. | . el [ wa expenences from which he may be able te draw 
=) The researcher aims to gain a holistic view of the context under study, that js ite lpi SONS: 
1 = bn vue . Be | r : ( 
arrangements as well as explicit and implicit rules that are all under the target of 3, What does the researcher want to understand at the end of the study? 
overview. in qualitative research, the purpose and context need to be tindersidod lor the recearct 
i r j ae ae , . See a a 7 Frese re A CuestiGins 
iii) The researcher attempts to gather data on the perceptions of the local individuals from, to be formulated, Like all components of the study, the questions are open to change while the 
the ‘inside’ through the process ola deep attentiveness of empathic understating and of study is going on, 4 
suspending any bias or preconceptions about the topics under study, | 
J ; ; : * : ? 
liv) After going through various materials, the qualitative researcher may identify certain 4, What will the researcher actually do? 
themes and expressions, which can be reviewed with informants but 


that MUst be In simple words, what approaches, st rategies and methods wil! he 
maintained in their original forms throughout the study. i 


used to collect data. This alsa 
includes the ee that is, the question of who is belie Studied and how the 
(vy) The major task of the researcher is to explicate the ways the individuals in a particular a ay or - — In qualitative research, the researcher prepares a tentalive 
settings come to understand and take action and otherwise manage their day-to-day aie wa ne ey hsb nas emphasized upon four components of methods used in 
éimiations qualitative research: the relationship that the researcher establishes with participants; what 
vi) The researcher may make many interpretations of the materials but some may be more settings or individuals and what other sources of the information that the researcher decides to 
vi) Caner Cer May Ay bal | ree | y | use; how the researcher gathers the information: and what the researcher will do with 
compelling for theoretical reasons or on the grounds of internal consistency. = Tf eptaki Bian advices ensure it | 
(vii) Qualitative researcher uses little standardized instruments at the outset. The reseai ee - 
Qu ae en = | archer So far as the relationship established between the researcher and the participant is 
himself is the main measurement device of the research. ; 7 ee pete 
— concerned, it can be complex and changeable, The tole of the researcher can c hange trom being 
(viii) — analysis of i with words, The VaHOUs approprae words are assembled, an observer to being a participant observer as we find in some ethnographic studies. The second 
su aor an 3 en into semiotic segments, These words are frequently organized component is concerned with sampling technique. In qualitative research, generally purposive 
i the research to compare, contrast, analyze as well as bestow patterns upon sampling is used since this provides the information desired. The basic aim here is that the sample 
sie is useful in answering the questions raised by the researcher. The third component is regarding 
| f Barding 
— | Jathering of data, that is, in fact a broad topic. The common methods of gathering data are 
A QUALITATIVE RESEARCH MODEL: FIVE COMPONENTS a ee : ae 
| em — aa observation, interviews and document analysis, 
In qualitative research the activities of collecting and analyzing data, developing and modifying 
theory as well as identifying and dealing with various validity threats usually go on 5. What alternative explanations may fit the findings? 
ome etee each influencing all the others. Maxwell (1998) has proposed a model al Why should the findings of qualitative research be trusted? What threats are there to the validity 
pa ae research in which five components for conducting qualitative research have been of the research? Following Maxwell, there are two critical aspects of research validity tor 
outlined, “ce five Comnang “ 7 i ; . ere tl ei . Pa RST 
sri n lact, these em components can be phrased as basic questions that the reseeae qualitative research: the bias of the researcher and the impact of the researcher's presence on the 
answet ducting alitative-rec ene NT sad | yy ieteca aaa 
sasdibidne ‘i r aap — A = research. These Components as phrased in terms | Participants or research settings. The researcher bias means the presence of some preconceived 
eSHONS May be discussed as under: " a ed eee see i nh 
‘ mn notions and beliefs that may aliect data collection, The reactivity of the participants to the 
1. What are the goals of qualitative research? Presence ol the researcher is also inyportant. Since both in a aan aioe 
search? pois se i aint + explain and understand them rather aking 
The researcher sets the goal or purpose of the stud | ) , fully to be ia Way part of the study, the researcher must explain and uncerst é 
beet Bs Pose of the study or the questions that are hopetully an attemn ion eee sncluding that they are not possible. 
answered, Ac cording to Maxwell (1994) there oe Z ger ech ditempt to eliminate them by canchuding the y 
. Mere are five specific purposes of qualitative researc 
(1) To understand the 


way th the These five components as spel led oul in terms of quest ions, tend to ee — _ 
behaviours or events ee en participants in the st udy understand may hange during the stucly and theretore, they sci ne fina! pi psi i a oy anes 
Thus these five Components form an integrated and IMeFactng whole ¥ ia ach COM 
Closely tied to others, rather than being linked simply in linear cyclic sequence. 


(1) To understand the conte yo%s 
d the context or environment in which the participants exist 


E'S A 
fh Methods int AHehbaniouna! Sciences 





i 4 i i ae ree M athens ft I if ces 
r ' i 


Teg Adi 
550 yest! ARCH 
IVE RESE oééial need toruse 
UALITAT at ics the special nee se 
RELEVAN' on qualitative communett its wyission’ 
schould use qualliatn  ecearch taile | — 
current situation? 4954 ions have hinted tow Felevang, 


ase quest ) atirative research is of much rp 
The answers (0 these 4 ed that qualitative Fe elevane 


. ere © for 
ch, It has been argu , pluralization ol life worlds, which ref ! 
qualitative resear’ an probably due to the Pp 

ycial relations 


Ors t 
se | 5 
: : “4 we know, today the traditional Clas: 
study oF s wave of living In one society. AS - 1d upper class are no longer * Mode in 
diversification iol orking, Class, middle class af cell abbeultinal, (et, ak “ONSidered 
a; we 4 wo > i ) ‘ many joc 4 cae y iy CTEN CEE 
yD jely ait ‘ : Bye “Hes aS é 2 ae 6 h 
pe for describing modern it iyralization of life worlds the key compone ave 
il Cul - me ‘ In sucn P ra ce 
on print . imporlant. 
became equally 


_dividuatization of ways Of living (Beck 199) 
hermas 1967), the erowin individualization Oo! Way g (Beck 1999) mae 
sbscurity (Habermas 1%" aaa 
alten of old social inequalities Into 


at 4 4 pl | Anse i orlance. Na u 


| = scatter a sdleD argued that the era of big narratives and theories is 

Apart from this, post ssa social theory that criticizes modernism and its concep. 
gone. As we know, a bets and science are produced into account (Flick 2009) 
baad — ae ad situationally limited narratives are the need ot ihe day. These daye 
ow ae daguneennne emphasis upon what is actually | akin b place in the sphere of life 
See heat Rapid social changes and resulting ge al : ong ‘ : win forced S0cial 
cientists to use new social contexts and abies Unt ‘. ie : sei Si deductive 
methodologies that require hypotheses formulation a ne ‘ : and test , 
them against empirical evidences are pone NOW he a 5 4 we pF : j ree . ao ierentation 
of objects. These scientists are forced to make use of sieises [rv 2 = pesiniaiiin i ie hex ead of Starting 
from theories and testing them, they require some sensitizing Conc epts for studying varying social 
contexts for which qualitative research is considered important and much relevant (Bruner 
1991). 


such dn “PProach ir 


he 


QUALITATIVE RESEARCH: A BRIEF HISTORICAL INTRODUCTION 


If we pay attention to the history of qualitative research, It Is obvious that psychology and other 
disciplines of social sciences have a long history of using qualitative research, History bears 
testimony that in psychology, Wilhelm Wundt used methods of description and verstehen in his 
folk psychology along with his experimental research work. More or less at the same time, a 
debate in German sociology was started between a science, which showed inclination towards 
induction as well as case studies, and an empirical and statistical approach. In American 
sociological methods such as biographical methods, case studies and descriptive methods 
remained central til! the 1940s. It was not until the 1960s that in American sociology the criticism 
of standardized and quantifying social research became important (Glaser & Strauss 1967) and 
this led to renaissance of qualitative research in social sciences as well as in psychology with 
some delay (Willig & Stainton-Rogers 2007). As a consequence of the said development in 
Germany and United States, qualitative research progressed in phases. It is also important to note 


that many Important psychological insights including the work of Piaget and Vygotsky have roots 
largely in qualitative research. | | | 


In the United States of Americ: 
memes OF America, Denzin and Lincoln (2005) have identified seven phases ol 
the development of qualitative rese 2005) have identified seven p 


WOOONSITEAE. Thee enc arch. The first phase is that of traditional period ranging from 
Period is related to the research of Malinowski (1916) and the dominant 


Chicago School of Sociology. Dye; 

other (that is, weit ee During this Period, qualitative researchers were interested in the 

The second phase “ee OF strange) and in its objective description and interpretation 
» Mat of modernist phase ranging from 1945 to the 1970s. During this 


ee 


Qualitative Research 559 


eriod, serious attempt was made to formalize qualitative research by publishing more and more 
rextbooks. The spirit has been kept still alive by Glaser and Strauss (1967), Strauss and Corbin 
(1990) and by Miles and Huberman (1994). The third phase is that of blurred genres that 
continued until the mid-1980s, During this period, the researchers placed various theoretical 
models as well as the understandings of objects and methods side by side so that they can choose 


and compare various alternative models such as ethnomethodology, phenomenology, symbolic 
interactionism, semIolics Or feminism (Guba 1990), The fourth phase was crisis of representation, 
which began since the mid-1980s. The crisis of representation discussion done in the field of 


artificial intelligence and ethnography had great impact upon qualitative research as a whole. 

this made the process of displaying knowledge and findings a substantial part of the research 

process. Consequently, qualitative research moved towards a continuous process of constructing 

versions of reality. Whatever people narrated in the interview does not necessarily correspond to 

whatever they had formulated at the moment the reported event happened. Likewise, the version 
given al the interview may not necessarily correspond to the version they would have given to a 
different researcher with a different research question. It meant that the researchers, attempting to 
interpret the interview and present it as a part of their findings, tend to produce a new version of 
the whole. The fifth phase started in the 1990s. During this phase, narratives replaced theories or 
theories were read as narratives. Here the emphasis shifted towards such theories and narratives 
that fit specific, loc al and historical problems. The sixth phase is marked by post-experimental 
writing as well as linking issues of qualitative research to some democratic policies (Flick 2009). 
The seventh phase covers roughly the period of 2000 to 2004 and is characterized by 
establishing qualitative research through various successful and new journals. Later, evidence- 
based practice emerged as the new criterion of relevance for social science and is counted as the 


eighth phase in the development of qualitative research as claimed by Denzin and Lincoln 
(2005). 


So far as the development in Germany is concerned, Habermas (1967) had first recognized 
that a new tradition and discussion of research was developing in American sociology related to 
names like Cicourel, Gotiman, etc. Right from this period, the model of research process put 
forward by Glaser and Strauss (1967) attracted the attention of researchers. Here, the basic aim 
was to do, so tar as possible, more justice to the objects of research in quantitative research. 
Kleining (1982) had emphasized upon understanding the object of research as preliminary till the 
end of the research because object would come to its form only at the end. A discussion about a 
naturalistic sociology and the principle of openness also provided strong impetus for growth and 
development of qualitative research (Hoffmann-Riem 1980), At the end of the 1970s, a still more 
broader discussion started in Germany. This discussion exclusively dealt with interviews, how to 
apply and analyze them as well as with methodological questions, which stimulated extensive 
research. The main question of this period was to determine whether these developments should 
be perceived as fashion or something like a new beginning. 

In Germany, two original methods were picked up at the beginning of the 1980s and these 
two methods were considered crucial for the development of qualitative research. These two 
methods were: narrative interview and objective hermeneutics. As we know, narrative interview 
iSa specif ic form of interview based on one extensive narrative. Here instead of asking questions, 
the interviewer asks the persons or subjects to tell the story of their lives as a whole, without 
Interrupting them by asking questions. Objective hermeneutics is a way of doing research by 
analyzing the texts for identifying its latent meaning. For example, transcript of a family 
interaction may be analyzed for identitying and elaborating an implicit conflict underlying the 
Communication of the members involved in interaction. Both these methods stimulated 
extensive research practice in biographical research (Rosenthal 2004), In the middle of the 
1980s, researchers also started showing concern for the problems of validity and generalizability 
oF the obtained results, The unstructured nature of the data obtained by the use of these methods 


ee 


Fi 1 ij TF 4 dt - 
f is i Rel cil ion ie i fF 

i, pre al 

i eg tpe & 


arch (Richards & Richards 1999, 
hal 


‘4 is concerned, it is relative! 
‘tion in India a ee 

- the qualitative orien (Prakash 2011). SINCE qualitative ,. “8h 
3007). earch inthe of consol bout offering the certitude Enjoyed jah 


i 55 = = nit 
is still int assump ore, only recently psychol,,: : 
developmen ome the well-esta lished ular and therefore y y¥ OP ist 
ions many | 
questions 


2! Sts 
sychology, qualitatiy dy, 
. tig not Very For rch, In Indian psy ih @ res, 
quantitative methods, it © Ts qualitative 1997; Dhawan et al. 1999; Chopra et a) , th 
started using UF " coaatives (Bharal & Aggleton 1999- Anand 
has been practica 002; Sibia & Raina 2001; Vagrecha & Asth al. 

bservation (Babu et al. « thnography (Priye 3 94: Vohra 2001), textual analysis ( he 
500! johri 2001; Puri spare 1999; Mangia fe bi | 2003; Anand 1998; Prathes ‘ 
2002); case study (Dosa ating rural appraisal (91NB Thomas 2002; Rangnathan 2003; 
2001; Sharma 199%) Reend 1 group inate tae Wali 1987; Raina, 1 997; Chitr: 3) ang 
conversational gee graphical analysis (Nandy 1942, Prabha 

rounded theory and DIOB! . 
4 Bhargava 2001 ). 

Thus we find that the developm 

reached a very crucial stage where, 
ever before. 


THEMES OF QUALITATIVE RESEARCH 


' s ditt es of themes W : Fativia. ro NB to 
P me cere woh themes which inculcate qualitative research. These twaly, 
atton (2002), | 


themes are divided into three major ee Ress n strategies, Data collection rieldwork 
strategies and Analysis strategies. ECON follows. ee 

1. Design strategies: All qualitative research uses pula: aie is se —e oe 
designs may be grouped under three heads: Emergent design flexibility, purposive sampli Ng and 
naturalistic enquiry. : ; 

In emergent design flexibility, as its name implies, the researcher avoids peting located into 
any rigid design and he may bring a change in his research work as the data are being collected 
and the samples are selected for their usefulness. This design flexibility is a staple of qualitative 
research that distinguishes it from quantitative research where the entire research design is 
spelled out before any data are created and also, no changes are made during the study, A 
qualitative researcher has some ideas about the research design but it is open to change 
according to the situation and as data are collected and analyzed. This allows the researcher to 
make correction in design for adjusting to new information and the situation. 

In qualitative research, the researcher gives preference to 
'o the other form of sampling. In quantitative research 
qualitative researcher pre 
documents) because such 
offer useful 


ae se 
ers 
put 


Ul 


research in various Countries has 


walitative ; ; 
ent of q jeing considered more impo 


Bfadual| 
on some points, it is | 


rtant than 


hich characterize qualitative research. Accordj 


purposive sampling in Comparison 
, purposive sampling is rarely used. A 
fers purposive sampling (or a particular sample of persons of 
~ 2 Persons are “information rich’ and ‘illuminative’. In other words, they 
a and desired information regarding the phenomenon of pribersset. 

c BSiQN Strateow ie 

re ila Si is natur quiry, also known as ethnographic studies, Here the 
'y untold naturally, They make observation without 


fi j ha hy a { - emery aa. | oy Tht ve. : Gs C = rcl = or i. l le 5 = | 


alistic in 
Situations as the 
control th 


4. Data collection and fie 
data collection and { leldwork st 
experience and engagement, empather 

Qualitative data include 
Persons’ experience, case 


ldwark Slralepies: 


rategie According to Patton (2002), there are four 


s in any qua 
: fC neutrality 
Inquiry, in-depth int 
slucies, observations 


anc Mindfulness 
erviews that 
yielding de 


and dynamic systems. 
apture direct quotations aboul 
tailed and thin observation and 





litative research: qualitative data, personal 


Qualitative Research 561 


document review. The very nature of qualitative data makes qualitative apenas 
‘ful. In fact, the richness of the obtained data permit an in-depth understanding of what | 
werlu died than could be derived frorn qualitative research. wees 
belNB uae of personal experience and engagement means that the researcher has ie 
sew people, situation and phenomenon under study by going into the real 2geme 
ani e, organizations, neighbourhood and street corners. In fact, such direct contact make 
pe i ucccHiption and understanding of both observable behaviours and internal states. Here 
weer rcher’s personal experiences and insights are considered an important part of the inquiry 
ue em ersariaitd the behaviour under study. 
aa? siti neutrality and mindfulness is another important strategy. Here the resea rcher has 
gue athetic and neutral at the same time in dealing with his participants or subjects. 
to De a Ee is a difficult task. The researcher tries to understand the feelings of the subjects as 
8 ee a empathy towards them by displaying openness, sensitivity, respect and 
well s yoesines while at the same time maintaining perspective as a good researcher. The 
er echer tends to report on any sources of bias and uses multiple sources for producing 
re aes 
venvorthy results. | | 
ane system assumes that changes are an on-going process whether focus ar ei 
individual or an entire community. A single-time picture of the individual or community a 
Se le is not enough and often prove to be misleading. Any individual or community can be 
yan only by observing the various changes that occur over the course of the study. The 
= Llteative researchers observe these changes over time and find a link for the various changes 
pest processes that occur. 


3, Analysis strategies: There are some analysis strategies in qualitative research. These are 
unique case orientation, inductive analysis, holistic perspective, context sensitivity and voice, 
perspective and reflexivity. | 

Unique case orentation means that the researcher treats each case as a unique one. The 
researcher’s own training and background regarding the case also adds to the unique quality of 
the data. For example, the data from a case study of a mentally retarded infant, and therefore the 
resulting report, would differ on whether the researcher's background and training were in the 
field of medicine or social work or special education. As a result, what the reader will know 
about the case would obviously depend upon the data collected and the perspective of the 
researcher. 


Inductive analysis is another analysis strategy that allows the researcher to explore and 
analyze the data without prior hypotheses. It helps the researcher to discover reality without 
fitting it into a preconceived theoretical perspective. Such openness to find whatever there is to 
lind has been considered unique to qualitative research and makes it distinct from quantitative 
research, which is based on hypotheses generated from theory, prior research or some models. 

Holistic perspective is another strategy. Here the researcher concentrates on the whole 
phenomenon under study that is considered as more than a sum of its parts. Here the focus is on 
complex interdependencies and system dynamics that cannot be reduced to a few discrete 
variables and cause-effect relationship. Thus, this perspective is important for understanding the 
complex nature of many aspects of human behaviour. 


Context sensitivity is another analysis strategy. It means that qualitative data are very 
sensitive to the social, historical and temporal context in which they were collected. That is the 
reason why data are not generalized to other contexts socially, spatially or temporally. Since 
something occurs in one jamily at a particular time does not mean that a similar thing should 
e€cur in some other family also. 


hy 4 gh EE CHE er ee ees 
mil. 


Ayn 
songs and 

ser Mecastdrenne 
562 fest, s 


her important strategy. A Walitative 


dlovivity is ot BR hs lOSGar. 
| spective and retle xIVly { perspective, Here reflexivity is us, are Vern Qualitative Keswarch $63 
Voice, persps his own voice and } te hia ateniilce Sed jy Ne 
to be reflective about his he qualitative researcher to be attentive to the Cultura] ‘| Wil 7, Ethnomethodology: Ethnomethodeloyy temed 
3 ay iret Se Upkda workin i ° ba ] : Fy “TS oy. ere rl ; | oo 
— spective, which require : ot pect of his Own perspective and Voices ol those tie of the researcher is to answer Ihe question In what me AN ae BY. Here the major Purpose 
Pere nd ideological as} - Mery) tivities so as they may be able te fen. Mays People make sense of their everyday 
social, linguistic a art is prepared, et ACHIVITIES § y May lo behave IN SOCallY aer eptable wayes 
and to those for whom the reports B™ tive research are broader ones and this aj 8. Symbolic Interaction, SYMboli¢ interaction sterns fre a 
| Thus we find that themes of qualitat! sal points » SISO Show used in linguistics to the same Extent. This there We i FON soctal paye hology and is also 
Us , earch on severe at TY If qualitative tecasc, nh a i 
e researcn OF ey 40 { HIVE esearch is based upon the 
PCED jantitative resea ssurmption that people Act ane INKEFAEL on the: | ' 
distinctions trom qu ah: c | Z d ass of Meaning of rrr. o¢ and “| 
SPECTIVES OF QUALITATIVE RESEARCH interpretation, Another assumption js that the Meaning of object: things iho ; pla ir Bi 
KE! PERS | social interaction that a person has wit wre. Cathe 2 ee BUR BS Gerived fro 
eaneos tical perspectives of the methodologies subsumed Under + i nin is are modified chron the om ones aN another assumption has been that these 
There are ditterent theore A re b (1 QR] | there are SIX such Perspectiy, , len, iiaale 5 | | q a i | C “Tpretativis Process. Liser| bry the persons in dealing with 
> research. According to Jaco . Bah Cs: 5 things, objects, etc., t fy encounter (Blumer 1969), Here the a . : 
qualitative researcn. ssi ae and specimen records), three fro : fron, Hone Wi me » he Me fesearcher tries tc answer the 
scolegical eavcholoey behavioural settings ore 7 ™M anth : question: What common set of symbols and understandings has emerged to give meanings to the 
hola ai ethnography of communication and cognitive anthropolo Yl and ones, interactions of the individuals? ” ii BIve Me anIrgS 
(holistic ethinc JY, a “Ah! : ear ie | 
pa psychology (symbolic interactionism). i a S ne et syStem js Usefy 9. Ecological psychology: Ecological Psychology emerges from ecology and psychology 
oh . “ap i * i e s5ue fa ir : al — : < OF SPE eY « ae ¥. 
not comprehensive. At the same time, it SC [he ie y sementing some fields ” Here the researcher tries to onswer 4 Central question like: How do persons try to accomplish 
two types of ethnography and two areas of ecologica ia | BY: their goals through specific behaviours in SPecitic environments? Th IS Perspective attempts to 
A much more wider system of categorizing theoretical Perspective has been Prov; | understand the meen between Persons: behaviour and their environment. Ecological 
Patton (2002). He has provided 16 theoretical perspectives of qualitative research. A by psychologists strongly emphasize that behaviour 'S Boal-directed, They collect data by observing 
i 0 } at st one "'Y Dhief the person and the environment, They try to keep detailed descriptions of both and then code 
isCussion fc : oe hiisenih | a 8 these descriptions for numerical analysis. In this way, methodology is qualitative whereas the 
|. Ethnography: The disciplinary root of wba: 7 CNNFOROLOBY, Which INCludes the data analysis remains quantitative. eas Nin 
study of cultural phenomena. Ethnography, besides being a theoretical Perspective, js also 4 10. Semiotics: Semiotics stems from linguistics. Here the b tion to consider is- 
y = . if 4 é . * : a a | " : ic < ‘1 
method of doing qualitative research. The major Purpose of ethnographic research jg jy How do various signs such as words symbols, etc., convey me weit sien “lot ie i 
determine the physical and social environment in which the persons under study, live and i Hermeneutics: The disciglinary ieee oleh, Sarg na shanianantanaiis 
AS a research strategy, the researcher here combines different methods but bases jt on amet si beckon sete sai : “ rela - re elaeal 
ag : se rs Oe ) Iterar 1S ay: entre Estion is: What are the cond ‘ : Ick 
Participation, observation and writing about the field or situation under study, For example, the : eran act takes place that ie ae i . are the conditions under which 
qualitative researcher wants to study how homeless and parentless adolescents deal with thei 12: (NESE siialpsleNonraroks a - | : 8: | | _ 
. oo * . : Fal Ne ay" r '$is/narratale slems | Cla 
health problems. Here he would combine participant observation in their COMMUnity with thei ‘ aelanetpents eins Te analysis/narratology stems from socia 
ie “i 1 og ee : SCIeNnces, literary criticism and literary nontiction. Here the central questions betore the 
interviewing. The resulting picture of the details irom the participation, observation and 
interviewing is unfolded in a written text about the field. 


researchers are: What does the story or 
from which it came? How can one inte 
life and culture that created it? 


+. Autoethnography: Autoethnography stems from literary arts. Here 
interested in answering questions 


| , ng q | like: how does own experience of this cu Iture connect with 13. System theory: System theory stems from interdisc 
and tend to offer insights into this culture. event, situation and the way Of life of the inhabitants! with which the researcher is concerned here is: How and 

3. Reality testing: Positivist and Realist approach—Reality testing stems from philosophy, function as it does? 
social oe and evaluation. Here the researcher is interested in finding answers to several 14. Chaos theory: Nonlinear dynamics—Chaos 
oe Ike: what is really going on in the real world? What is the truth in so far as we can get it? natural sciences. Here the central question of 

aw can we study a phenomenon so that the conclusion May Correspond to the real world as fat disorderly phenomenon? 
as possible? 
4. nstructionismY/Construct 


'5. Grounded theory: Grounded theory stems from social sciences 
a Grounded theory is a method, an approach 
'S SOCiology. Here the researcher's } sta 
setting or situation constructe 


Narrative reveal about the indix idual 


and the environment 
rpret this narr 


ative tor understanding and illuminating the 
the researchers av 


iplinary approach. The question 
why does this system as a whole 


theory stems from theoretical physics and 
concern is: What is the underlying order, if any, of 
iets The disciplinary root of constructionism/constructivism 
alm 1s to answer the 


and methodology. 
question: how have the individuals ina 


, astrategy and therefore, itis nota theory at all. It can 


best hedeniaad as a research strategy whose basic purpose is to generate theory from data. 

“a ai ed reality? What are their reported perceptions, beliefs and ‘Grounded’ clearly means that the theory will be generated on the basis of data. el will, 
nsequences of thejy constructions for their behaviour?, etc. therefore, be grounded in data. Theory here means that the objective of collecting and ana yzing 
5: Phenomenology: Phenomenology stems fro 

and analysis of the ; | 


™ philosophy. Here careful description data obtained from research is 


'd, and the meaning making and understanding in thatlilé theory is that the 


world are done. The researcher js basically interested 


to generate theory, Therefore, the essential data in grounded 
meaning, structure and 


ory would be developed inductively from data. Grounded theory, besides being 


in answering questions like: What i the a theoretical perspective, is an overall strategy tor doing research, Here ai Fa eed : 
; ns | ge : nicknames ive analysis e obtaine 
“ssence'of the lived exer | 3 “oncern is; What theory actually emerge from systematic Comparative analysis o 
BrOup of persons? © lived experience of a phenomenon for the persons of data? SE A . 

6. istic j gre wies x ta etc —Feminist inquiry, critical theor 
Magee INquITY: Heuristic inquiry stems from humanict: -hology. Here the 16. Orientational: Feminist inquiry, Critical theory, etc, gas ee cenit 
re. Nes to answer the QUEStion: What ic ‘a AManistic psycho By. ve alto queer theory, etc., stem from political, cultural and economic ideologies. 

Perienced the same Phenomenon intensely? © experience of others who have | 


‘ ‘toch ie . | T Study? As 
Question of concern is: How feminist perspective manifest in the phenomenon under study 


ee 


Ta Hee 
4 raged rcp : 
564 Jest measurements and f 


has been making an important contribution to soc 
 eulatly very significant role in qualitative researc}, 


I 
hods (Olesen 1994: Roman 1992). Feminis Gre ar Pie 


we know feminist think! 
| ing a 
since 1960s, play'N6 mm A 


Quealiiative Research 565 
citizens and empowers them to move 


” e 
-hes ee tre if 
- mainict approaches tOMeer an | ‘male dominance. ‘S€arch ;_ le ee awa he 

feminist apP of women’s life situation and ofa ds ec weeatch Was ao Oey memorization were considered important nisi Hany traditional jobs. Teachers’ voice and 
Foot ualitative research due to the methoas opening up More to Women’ | Mar key criteria of intelligence. Priya (2002) ondocnd acer and recalling were perceived as the 
by the use a mies (1983) has pointed out siete why feminist research isin SV ice . rocesses of suffering and healing among the A aly study for investigating the 

needs in genere™ fie antitative research. Quantitative re Ore fin OM F  tanuyary 2001. For understandine the S of Bhuj earthquake that brought havoc o 
ae ther than to quan Search ofte. . Ken, 90 January sya erstanding the | é : 5 n 
qualitative reseate oo ‘hem into objects and theory are often studied in oe onett <emi-structured ineardieae focus si re prey of suffering and healing, she used a 
voices ol women, % aaaenee voices to be he ard and goals realized, As Ue-neutea the expressed a strong belief that doing their duties fcostasbaae lt was reported that the villagers 
Qualitative researc @ search focused On @ Critical analysis of gender relatio Ssher (1999) would lead to healing among survivors, On the other f _— bring harmony and peace, which 
pointer Out eT cake f need for social change to improve tl tL OTSHIDS in tae. more materialistically-oriented. Likewise, i hand, victims in urban areas were, in fact, 
and theory... and the recognition oF Ne : prove the lives Of Wom Search , IN an ethnographic study of educational institution 

= 


| | Mirambika at Delhi, Sibia and Rai ee 

" | gig enratical Deis - . ns named M : mM, nd Raina (2001) u sstie levies cl a ae ae 

Thus we find that there are different theoretica perspectives. Of these Varlous per informal interactions and questionnaires over a emi e rate ein 
; | nonths. Thematic data analysis 


three theoretical perspectives namely ethnography, symbolic interactionism and as -evealed that learning features and interpretatior 
. mon ones ™ ECOlgg; -- Validity of the i erpretations of classroom were drawn by inductive 
psychology are very Com iCal analysis. Validity of the interpretation was established through triangulation. The study provided 


RESEARCH DESIGN STRATEGIES OF QUALITATIVE RESEARCH a kaleidoscopic view of curriculum organization, teaching styles and disciplinary techniques and 


seating patterns of classrooms of Mirambika. Similarly, using ethnographic field study in 
Bhubaneswar, the domestic lives of women and families have been investigated by Menon and 





As we know, research design is a plan for collecting and analyzing evidences, which enah 
‘ 


researcher to answer whatever questions he has posed. In fact, research design aim les th Shweder (998) a0e SeyMnOUt 989). In these studies a fine blending of observation, interview 
almost all aspects of the research, from the minute details of data collection to the salle 0 toy and don eat done and they offered an insightful mapping of the self and family situated in 
ON of the the social context. 


various techniques of data collection. A qualitative research generally uses the f 

research design strategies: | ‘ jollawing Popular There are some features of ethnographic studies. The central feature of ethnography 1s to 
|. Ethnography or ethnographic studies study and understand the cultural and symbolic aspects of behaviour and the context of that 

| behaviour, whatever may be the specific focus of research, In ethnographic study the specitic 


2. Case studies 5 ceuceendl hee! : 
C = focus of research Is typically either some group of people or a small number of cases or just one 


3. Document or content analysis case, in detail. In addition to these central characteristics, ethnographic studies have the 
4, Grounded theory following six characteristics. 
5. Retrospective studies (i) The ethnographer starts with the assumption that the shared cultural meanings of group 


are crucial to understanding its behaviour. Any group of persons like prisoners, tribals, 
patients, etc., develop lives of their own that become somehow meaningful and 
reasonable once the researcher get close to it. The ethnographer’s primary task is to 
uncover such meaning (Goffman 1961). 


1. Ethnography or ethnographic studies: Ethno , i 

) hic s, ography , sometimes known 
anthropology or recently as naturalistic inquiry is, in fact, a method of field study absentee 
became popular in the second half of the 19th century. The term ‘ethnography’ tself comestun 


cultural anthro ‘Ethno’ 
something. Thilrahiciadhymacn ea ; ll a Ot oan? means describing (ii) In ethnographic study the researcher js sensitive to the meanings that behaviour, actions, 
the viewpoint of its participants. In still very simple ae A me un Srfan ing a way of life from events, etc., have in the eyes of the people being studied. Theretore, a major task of the 
study of people in naturally occurring settin ee . % it can be said that ethnography is the ethnographer is to elicit that meaning or knowledge from insider's perspective. 
ordinary activities and their social meanin : . ss cea d means of methods that capture their (iii) The ethnographer studies the group or case in its natural settings. True ethnography, 
the setting for collecting data in a sig b ee ving the Investigator participating directly in therefore, involves the researcher becoming part of that natural setting (Fielding 1996) 
externally. | ignificant manner without anything imposed on them (iv) Ethnography is not a prestructured study, rather an unfolding and evolving part of study. 
Hammersley and Atkinson (1995), , . | it means that it will not be normally clear what to study in depth until some progress in 
ethnography by taking a fairly libe pe ina authorities in the field have described research work has been done. While specific research problems am si laneleraggs 
covertly, in the people’s daily lives : view, whereby the researcher participates, overtly of used in the research, they are more likely to develop the study proceeds, This 
listening to what is said, askin Tor an extended period of time, watching what happet principle also applies to data-collection procedures and to ata. : - 
ethnographic connection " An beth and collecting relevant data. They have emphasized (v) Ethnographic data collection Is typically prolonged and repetitive. There Is both specitic 
ethnographers in face Wadi co a way of doing social research developed by and general reason for this (Woods 1992). The specitic reason Is that the Sloat 
Some important studies wel Pi by positivism. needs a comprehensive and desailee reco pang apereeenery mee 
Sarangpani (2003) conducted volving ethnography have been done in the Indian context time and again. The general reason !s Mat degaie ssh oie, Wennalls 


N Ih: an ethnographi | csc i | 7 | interpreta 
North Delhi. It was an interact; hnographic study of teacher-taught relationship ina village" symbolic importance and cultural 1n"erp' deeper and important levels of reality. 
know ( ; Eraction between th {0 takes time for the ethnographer to arrive at the deeper P 

schoolchildren). Chil ose who know (teachers) and those who came | ahaa 


j : ahy ic eclectic and not restrictive. The 
fen pervaived schooling as something that makes them §° (vi) From the viewpoint of data collection, ethnography !s eclectic 


ni continuum of ethnographic field work may range from direct nonparticipant observation 


q FSCLIS 
1 ¥ [ e qd Is Be J u 


nan eles 
Test Measurentents and Resear hr <i 
eay, Jat | | ? a 
- on then to interview!INB with one or more informants. aNd the 
nt observation, th j 


ny 


to participa | nselves. : 
e themse ae riate? In general, 
the words of a approach considered — ee bse ne fan or ORFah 
sri o peice most appropriate - ai ler te e of the behaviour within the ext of 
approach Is const nings and the signn™~ ‘ . CONte 
behaviour and the symbol sf developing insight into a culture or social process Particula 
cellent way © | 


including those of organizations and INStitutic 


In fact, it is an e% Itures 
those involving other cultures - — study is the danger of methodological arbitray; a 
“or limitation of ethnographic >"? he subject under stu Ness 
haga io the approach shows flexibility ee iu sith Sorachat : ee Many farly 
fae shic studies were criticized on the grounds r to get more than su erfi tal : ite time 
et mare sole of the tribe or individuals to be studied to get ies would i ‘Clal View, Later 
a eegieanre developed clear realization that eneeger les WOU © Meaningles. an 
casigtett net the following conaitions: 
invalid if the researchers have not mM a : Eee 
(i) The researcher must live for a longer and extensive period of time AMONE the 


individuals/tribe and become an integrated member of the cultural group. 


(ii) The researcher must learn the native language of the cultural or social BTOUD SO that he 
should be able to develop the sensitivity to think, feel and interpret observations in term, 
of the cultural group’s concepts, feelings and values. 

(iii) The researcher must have trained his informants to systematically record field data in 
their own language and cultural perspective. 


2. Case studies: In qualitative research, like quantitative one, case studies are the 
popular research strategies, A case study is a way of organizing social data for the purpose of 
knowing about social reality. It examines social unit as a whole and the unit may be a person 3 
family, a Community, a social group, a social institution or a nation. It could also be a decision, ke 
a policy or a process or an event or incident of some sort and there are also other possibilities a. 
well. According to Brewer and Hunter (1989) there can be six types of units (or cases) which Gani 
be studied: individuals, attributes of individuals, action and interactions, residues and artifact of 
ee settings, incidents, and events and collectivities. Any of these units can be in focus ina 
case study. 


Obviously, then, there are different types of cases and therefore, there can be different types 
of case studies. Stake (1994) has distinguished between the following three types of case studies: 


(i) Intrinsic case study: It is defined as a case study where the study is undertaken because 
the researcher prefers a better understanding of the case selected for study. 


(ii) Instrumental case study: It refers to a case study where a particular case is examined to 
8!ve insight into an issue, or to refine a theory. 
(iii) Collective case study: It refers to a case stud 


where the i al ca 
extended to cover several y the instrumental case study is 


Brainard fe general condition. Itis 
s multiple case study or comparative case study. 


withiaahe Ge thidimesne ae ele case studies where the researcher gives focus 
gph | ‘ Lu ane pe 
gives focus upon both within and across a multiple case study where the researcher 
Various definitions of ae 
5 OF Case study h 1: ; ; : 
defined as a study of bounded Ses ave been offered. According to Stake (1988), case study is 


eh ng altention only to those aspects t 2 Seem ue unity and wholeness of that system but 
elined case study as an empirical inquiry © refevant to the research problem, Yin (1984) has 


that i igates 
hat investigates a contemporary phenomenon within 
i 


* 
For ; is i 
4 further discussion, the reader js also 


referred to ¢ : 
16 consult ¢ hapter 15, 


Qualitative Research 567 


its real-life context when cos boundary between the context and phenomenon is vague and not 
clearly evident and in which multiple sources of evidence are used. Following Goode and Hatt 
(1981), case study is not a specific technique rather iLis a Way of organizing social data so as to 
preserve the unitary character of the social object to be studied. 

Analyzing these various definitions, we get the following four characteristics of case study : 

(i) A case Is a bounded system, that is, it has boundaries. Generally, the boundaries 
between the case and the context are not necessarily evident but the researcher needs to 
identify and describe the boundaries of the case as evidently as possible. 

(ii) The case is case of something. The researcher needs to identify what the case is, a case of 
importance in determining the unit of analysis. 

(iii) In case study there is an obvious attempt to preserve the wholeness, unity and integrity of 
the case. 

(iv) In case study, multiple sources of data and multiple data collection methods are used in 
naturalistic setting. Some case studies prefer to use sociological and anthropological 
field methods such as observations in natural settings, interviews, and several narrative 
reports whereas some case studies may use questionnaires and some kind of numerical 
data. It obviously means, then, that case study is not necessarily a qualitative technique, 
it is also used as a quantitative tech nique. However, most case studies are predominantly 
qualitative. 

Yin (1998) has listed six types of data with which researchers of case study must be aware. 

(a) Archival record: Archival records are usually quantitative data that include survey, 
questionnaires, records of organizations, results from prior studies, etc. 

(b) Documents: Documents include materials like school records, health records, official 
notes, etc. 

(c) Direct observation: When the researcher visits the case study site, he prepares some 
field notes based upon his observation. 

(d) Participant observation: \t is observation of the field situation by the researcher who 
becomes a participant in the situation being observed. This provides important data for 
some case studies such as when a college student wants to know how does a seminar 
meant for college students operates. 

(e) Interview: In case studies, interviews are open-ended in nature. The interviewers are 
considered as informants for these data and the verbal reports of the interviews should be 
triangulated for later verification. 

(f) Physical artifacts: This is used in physical anthropological studies of earlier times. 
Arranging a classroom to understand the class as a case would be an example. 
Those willing to conduct research using case study need to follow the steps as described 
below. 
(i) They should be clear on what the case is, including identification of its boundaries. 

(ii) They should be clear on the need for the study of this case as well as regarding general 
purpose of the concerned case study. 

(iii) They should translate general purpose into specific purpose and research questions, 

(iv) They should identify the overall strategy of the case study, especially whether it is one 
case or multiple case. | 

(v) They should clearly specify what data are to be collected and from whom and how. 


(vi) They should make explicit how the data will be analysed. 


se studies in different settings such as in clinical setting, 


Indian Psychologists have used ca 
1999) used case study to examine a case of 45-year-old 


Organization, etc. Dosajh and Dosajh | 


_ soural Sciences 
; : Metis Hi Bebat 
Reseurce | 


568 Test Measurements ana = 
nk. He was on medi 

orked as assistant manager In 2 Dé edical |e 

man with two children sans retirement due to intense pair In the lumbar rEZion, |r wae 

was planning to take ie ‘ved sexual relations and depression had played 5 Sign ce 

that his disturbed tami ‘iarapeutic measure, the senior officers were requested to cha | 

role in his iliness. As 4 + recovered and resumed his duties 


eet : tien hanks aft 
attitudes towards the patient. The = members of the family. Likewise, Case Studies ‘i Pre, 


‘on rendered by ues an 
cooperation nia (aes women’s problems such as physical assault (G andh; 190 
a spe ives of compromise on the complainant (woman victim) and the a 5) a 
well as the effectiven a i | 
fsych as husband and in-laws) (Chopra 1999). 


<ic: The reader is referred to consult chapter 12 ; 
3. Document or content i “ Teta 


: ; ‘ document or content an 
——s theory: Grounded theory is both s research strategy and a way of analyy: 
ama | discuss grounded theory only as a research strategy and in?" 
data. In this section, we shall discuss | ; Y and in the 
: sili fiche f analyzing data. As it has been expJ3; 
section, we shall deal with it as a way 0 yaine (1967) when th Plained Carligg 
grounded theory was developed by Glaser and Strauss (1 : 7 : i wr Soa doing research 
from a sociological perspective in organ izational COMERS. not J Words, It was deyely Pe 
method for the study of social behaviour, especially, complex one: The essential idea . 
grounded theory is that theory will be developed inductively trom data. Grounded theory, thus n 
an overall strategy for doing research. For implementing that strategy, grounded theory he 
particular set of techniques and procedures. 
Now let us look at the historical background of grounded theory. The history of Brounded 
theory is a short one and can be traced mainly through its five key publications. In 1969, GI 
and Strauss started collaborative work in medical sociology and published two very im = 
studies of dying in hospitals (Glaser & Strauss 1965; 1968), These two publications 
Awareness of Dying in 1965 and Time for dying in 1968. These two books proved to be ye, 
important ones and represented a different style of empirically based sociology. In rESPONSE to the 
numerous requests made by the readers of those two books, Glaser and Strauss wrote another 
book and it was published in 1967 under the title The Discovery of Grounded theory, which Was 
regarded as the first important publication about grounded theory. This book had three major 
purposes to serve: to offer the rationale for theory that was grounded, to suggest the logic for and 
some specifics of grounded theory and to legitimate qualitative research carefully, 
Glaser and Strauss taught grounded-theory-style seminar in qualitative analysis at the Univers; 
of California, Many researches using grounded-theory-to investigate a variety of herons 
were published later. The second key publication entitled Theoretical sensitivity was released by 
Glaser in 1978, almost 11 years after the publication of The Discovery of Grounded theory, The 
book focused mainly upon updating methodological development in grounded theory and to 
help the researcher develop theoretical sensitivity. The third key publication entitled Qualitative 
Analysis for Social Scientists came out in 1987, nine years after the second key publication, In 
et was given upon qualitative analysis in general. This book was reparded as the 
_ is — . rien ae understanding of social behaviour through a particular style 
The fourth key publication entitled Basics 
ia Procedures and Techniques was 
> ae te very useful to all those researchers in di 
S Of qualitative data anal 
grounded theory. The fifth important p 
book by Glaser. The book wr} 
subtitled as Emergences Vers 
has tried to correct all those 
and Corbin’s book. 


rant 


Later on, 


of Qualitative Research and subtitled as 
done by Strauss and Corbin in 1990, This 
Fone fferent disciplines who aimed to build theory 
ysis. In fact, the book presented the analytic mode of 
ae mean was the result Of revision of Strauss and Corbin's 
jer ¥ Miaser was titled as Basics of Grounded theory Analysis and 
mi went Pal book was published in 1992. In this book, Glaser 
usconceptions about grounded theory that were evident in Strauss 


_—e—s—_ 


Qualitative Research 569 

Besides these methodological works 

gevelopment of grounded theory could also 

as by Chenitz my omc eee) Ment ld also be made of two other works by 

wane os chew teu Rei 4 : Reader (1993) and More Grounded theory 

mMethoao!ogy: eee rough which his Criticism of Strauss and Corbin work 
continued. Finally, Strauss and Corbin presen “ Seales 


=: ted an overview of grounded theory methodol: 
ina chapter of book entitled Handbook of Qualitative Research hae by eee ned aa 


On grounded theo 
be mentioned like 
lon shoy 


ry. others having impact upon 
those by Charmaz (1983) as well 


of qualitative data, : is equally applicable to quantitatiy esearch str 
developed In Sociology, grounded theory is, in fact, a general w : 
does not depend on any particular disciplinary pence. al way of approaching research and 


Grounded theory Custis explicis Purpose of the generation of theory from data. In grounded 
theory thal aims tO generate theory, no up-front theory is proposed nor any hypothesis is 
jormulated for testing. In tact, the researchers have to start with an open mind aiming to end up 
with a theory. | 

28 far as the use oF literature in grounded theory is concemed, it has different perspective in 
comparison to other research approaches. The difference lies in the fact that how the literature is 
dealt with and when it is introduced and follows the principle that grounded theory places on 
theory generation. If a satistactory theory already exists on a particular topic, there is no sense in 
undertaking a study to generate a new theory about it. The obvious rationale for doing a 
grounded theory study is that the researcher has no Satisfactory theory on the topic. In such a 
case, the researcher wants to approach the data as open-mindedly as possible, guided by 
research questions. The problem with reviewing the literature in advance of such a study is that it 
may strongly influence the researcher when he actually begins working with the data. In such a 
case, it makes sense in delaying the review of literature at least until the conceptua! directions 
within data have become clear. Here the researcher will introduce the literature later, seeing the 
literature as further data for study. Therefore, the key principle in using review of literature in 
grounded theory is that literature is seen as further data to be fed into analysis but at a stage in the 
data analysis when theoretical directions have become more or less clear. 

5. Retrospective studies: Retrospective studies, as their name implies, are defined as 
those studies in which retrospectively from the point in time when the research is carried out, 
some events and processes are analyzed in respect of their meanings for different persons or for 
their collective life histories (Flick 2009), Biographical research is a eood example of 
retrospective studies. Here those persons are selected for informants who will be meaningful and 
relevant for the process to be investigated. Data are often (but not necessarily) collected with the 
help of narrative methods. They are analyzed with narrative or hermeneutic approaches. The 
basic aim of the research here is to develop theories from data that are analyzed. 

One danger in any retrospective research is that the current situation in which an event is 
recounted, tends to overlap the earlier situation or influences the assessment of the past event. 
Moreover, the perspective on the processes that are analyzed, may be distilled from the 
viewpoint of interviewers or from studying such documents, which have been produced or filed. 


ategy and method of analysis 
e data in Glaser’s view (1992). Originally 


Of these various research design strategies, the qualitative researcher may select any one or 
may combine any two lor sometimes more than two) designs depending upon the research 
problem being investigated. 


SAMPLING TECHNIQUES OF QUALITATIVE RESEARCH 
In qualitative research sampling is a very important step, as it is in quantitative research. 
However, there is difference in sampling of these two approaches. In quantitative research, the 


by Methods TL DEMON ar ee es iam 


: ‘ ceva 
570 Test, Meastiremenis and Reset 


7 often, probability sampling is used that jc 4. 
focus is on people sapling Me vexwenentantverieal the findings from hae Feet at 
representativeness. Becalise “ual ative research rarely uses probability samplin i, ate 
generalized to the population. jsive sampling is the term frequently used. ft sbves a athe, 
uses deliberate pike ees way, with some purpose in mind. This chile Gang 
net caer . poo ii events because they can readily provide desired information Sets 
daslesiatetial the sample is useful in answering the questions fale? by the resea rcher, ™ The 
There is no simple typology of strategies for calla in ee research due tO the 
variety of research approaches, purposes and settings. ue ‘ak repel 994) have 
qualitative sampling strategies in typology. Patton y i coi ear and Jane 
have contributed still other strategies. In general, the following 12 sampling strateg 
popular in qualitative researches. | 
(i) Complete collection: In this sampling strategy, the structure Of group taken : 
account is defined before data collection. Gerhardt (1986) used this sampling Stratepy | 
study about events and courses of patients’ careers in chronic renal failure in south-e 
In order to know about events and courses of such study, she decided to do a comp! 
of all patients (male, married, 30 to 50 years at the beginning of the treatment) of the five Major 
hospitals (that is, renal units) serving the south-east of Britain, Here we find that sampling i 
limited in advance by certain criteria: a specific disease, a specific age, a limited Period, 4 
specific region and a particular marital status. On the basis of these criteria, relevant Cases aie 
taken. Here sampling is carried out because virtual cases, which don’t meet 


wt one oF more of thes 
criteria, are excluded in advance. Such method of sampling is common in rural studies. 


Since in this sampling strategy the structure of the groups taken into account is defined 
before data collection is done, it restricts the range of variation in possible COMparison 
Therefore, at least at this level, there will be no real new findings. 


(ii) Theoretical sampling: Theoretical sampling developed by Glaser and Strauss (1967) jn 
medical sociology, is a sampling procedure in grounded theory research where CaS€S, Broups of 
materials are taken according to their relevance for the theory that is developed and on the 
background of what is already the state of knowledge after collecting and analyzing a certain 
number of cases. Paraphrasing the viewpoints of Glaser and Strauss regarding theoretical 
sampling, it can be said that it is basically a process of data collection for penerating theory 
whereby the researcher jointly collects, codes and analyzes his data and decides what type of 
data be collected next and where to find them in order to develop his theory as it emerges, Such a 
process of data collection is controlled by what is called as emerging theory. The most important 
principle of theoretical sampling is the idea that subsequent data collection should be guided by 
theoretical developments that emerge in the analysis. In other words, the basic principle of 


neon o sampling is the genuine and typical form of selec ting Cases or materials in qualitative 
research, 


_~ Bl ear 
listed 14 
sick (1994 


lS are y 


ete Collection 


Sampling decisions in theoretical sampling may start from the 
compared or they may directly focus on spe 
sampling is how to decide wh 
suggested the criteria of theore 
meant a point in the prounde 
does not produce 


level of groups to be 
cific persons. A very Important question in theoretical 
en to stop integrating further cases, Glaser and Strauss have 
tical saturation (of a category, etc), By theoretical saturation 's 


d theory research at which more data about a theoretical category 
, | any further theoretical insight (that is 
integrating further material is finished when | 
cases has been reached. 


nothing new emerges), Sampling and 

a theoretical saturation of a calepory OF BrouP of 
(iil) Ext 21 C4960 cCamaling: ms : 

vie an Case sampling: This sampling tec hnique is used when the strategy is to learn 


| | Qualitative Research $71 
disclosed from its extremities to arrive at an understand, 
4erstandi 


| urposively integrates extreme or deviant cases. Ina 
differ from the dominant pattern or that diffe ore 
Here, the goal is to locate a collection oy 


ng of the field. Thys here, the researcher 


r from the ne = archer here seeks cases that 
8 : unusual, dif inarvt characteristics of other cases. 
characteristics. That is why extreme Case sampling Isalen ex or peculiar cases having special 
cases are ne because they are unusual, the researcher ities yes Fea i ihe 
social lite of t coe bases. For example, suppose the ieéeercher C a ae about the real 
school dropouts. Let us suppose that the Previous research dilgensts the He: pi 


: from families having low ; 
dropouts come ni o NaVIng low Income, single . se 
geographically mobile. A researcher using extreme ‘ Parent and from those families that are 


ate ) ase sampling would seek majority-grou 
dropouts who are irom stable two-parent, upper-middle income families who are setae tnt 
stable. 

iv) Typical sine sampling: Here the researcher selects some typical cases, which may 
represent the oe or we he of the Cases. Here, the field is in a way disclosed from center 
and from inside. For example, for studying about the success or failure of the foreign policy of the 
country, the researcher may select a 


; ; few parliamentarians who are likely to represent the overall 
and average view of the public. 


(v) Maximum variation sampling: A maximum variation sampling is one where the 
researcher deliberately seeks as much variation as possible in the samples, Thus here, the 
researcher integrates only a few cases but these are as different as possible so that the range of 
yariation and differentiation in the field may be disclosed. 

(vi) Intensity sampling: Here the researcher plans to select cases, events, etc., according to 
the intensity of features, processes and experiences, and so on as given or assumed in them, The 
researcher prefers cases with the greatest intensity or different intensities which are systematically 
integrated for the purpose, 

(vii) Sensitive case sampling: Sometimes the researcher may select important or sensitive 
cases only in the sample for the desired outcome. For example, the researcher may select and 
integrate a few top leaders from Hindu and Muslim community to know whether the construction 
of a mosque at the designated place would be in the best interest of the public. 


(viii) Critical cases sampling: Here sampling plan includes selection of such eritical cases in 
which the relations to be studied becomes clear in the opinion of the experts or which are 
particularly important for the functioning of the programme to be evaluated, 

(ix) Convenience sampling: Here the researcher selects those cases that are easiest to 
access under conditions of study. Such samples are cheap and quick. The person-on-the-road 
interview conducted by television programmes is an example of convenience sampling (or 
sometimes known as haphazard sampling). Television interviewers go on the road with camera 
and microphone to talk to a few people who are conveniently available. | | 

Morse (1998) has also provided some useful suggestions regarding sarngling ie get 
is Clearly aimed at making a distinction between primary selection and se on vie ne zsh im 
has outlined several general criteria for qualifying a good informant ae pcan ree 
ervaslocing wean cases, ne he sole capabtl to reflect anc 
ae we tanta gai saat observed and should be ready to participate in 


articulate, they should have adequate tm iy selection, which is 


i ce criteria is known as prim 
if Suny: InuaEraUnG casts according to tnele ort ia ‘e meant i those cases that somehow 
different from secondary selection. By secondary selection Tlie 10 a some time lor interview, 

. ; | i I |e 5 
don't meet all the criteria previously mentioned bul are WITINR Ob 


vurces in such Cases, 
| in any resources I SUCT Ge: 
Morse has cautioned that one should not invest many 





j Research | fethods tn Behavioural Sciences 
bE ™ a 4 rs 


72 Test. Measurements an 


oe questions against which to check a qualitati 
_ also provided some queso Ualitative 
ave ies and Huberman (1994) have suggested six general questi Sa 
ling plan can be judged. There questions are: 


Experts h 
plan. For example, 
which the effectiveness of any samp | 

(i) Is the sampling relevant for the conceptual frammewarn and research questions? 

(ii) Will the phenomena, in which the researcher 15 interested : appenh In Principle 

lili) Does the sampling plan enhance the generalizability of the findings? 
(iv) Is the sampling plan feasible in terms of time, money and access to people? 
(v) Is the sampling plan ethical in terms of such issues as informed consent, 
benefits and risks and the relationship with informants¢ 
(vi) Can believable descriptions and explanations be produced which are true to real life? 


DATA COLLECTION TECHNIQUES IN QUALITATIVE RESEARCH 


Qualitative researchers generally study spoken and written representations and record of huma 

experiences, using multiple methods and multiple sources of data. There are several techpj n 
of data collections and several types of data collection might well be used in a qualitative si 
Here we shall concentrate upon the following three most popular data collection techniques of 
qualitative research. i 


pli 


Potentia) 


1. Interview 
2, Observation 
3, Document review 


1. Interview 


Interview is one of the main data collection techniques in qualitative research, Interview is used 
to gather information regarding the interviewee’s experience and knowledge, his or her opinions 
beliefs and feelings as well as demographic data. Interview questions can be asked so as to 
determine past or current information as well as predictions for the future. While interviewing is 
basically about asking questions and receiving answers, there is much more to it than that 
especially in qualitative research. Following Fontana and Frey (1994), interviewing has a wide 
variety of forms and multiplicity of uses. The most common form of interviewing is individual and 
face-to-face exchange but it may take the form of face-to-face group interviewing, 
self-administered or mailed questionnaire, telephone survey, etc. Interviewing can be structured, 
semi-structured or unstructured, An interview can be one-time brief exchange (say for 5 to 10 
minutes) ‘over telephone, or can take place over multiple, lengthy sessions over days as i 
happens in life-history interviewing. 
Th di — — 
Katies oh aa types of interviews and much has been written on this topic. For example, 
sae is sane neon between three main types of interviews: informal conversationa 
daserilac 4 Sa ie ps lew guide and standardized open-ended interview. Fielding (1996) 
a venetaie TPES S interviews using the terms standardized, semi-standardized 4 
standardized. Similarly, Fontana and Frey (199 od; Jaceification 
cenictiinnel Frey (1994) used a three-way classificatio 
, semi-structured and unstructured intervi , individual a 
group interview, Besides t rview and they apply to both indivigual ®” 
w. Besides these, there are many other classificati  intervi e we wil 
discuss some of the common t i ¥ er classifications of interview. Here 
(i) $ nai ¥pes of interviews frequently used in qualitative research. 

_ WW Structured Interview: A structured interview | ‘< asked 4 
series of pre-established questions wi hy ms view is one where the respondent !5 pais |! 
in the interview is usually called with preset response categories. The series of questions rei 
respondents receive the same eccia, schedule. ifthe interview is standardized 2 : 
minimized. Here the teniever ions in the same order. Therefore, flexibility and variations 

ays an eles “Thexces ty 
Plays a neutral role. The stimulus response nature of this tyP© 


Ouulilatioe Research 973 
interview SUTESSe> rational rather than emotional informati 2ug 
comparison of questions across respondents and ation, which is e 
included. However, such an interview leave 
sometimes the respondents feel constrained beca 
they feel is important. 


asily quantified, ensures 
makes sure that certain topics have been 
S little room for unanticipated discoveries. 
use they are not free to give the information that 


(ii) Unstructured interview: \n unstr iui | . 
order are not fixed, They are allowed to sealan koe Seis a dea and their 
Open-ended answers allow interviewees to say as little or as a i: ei a 
known as informal conversation interview. The traditional type of unstructured iter needs thie 
nonstandardized, open-ended, in-depth interview, sometimes also called as ethnographic 
interview. According to Fontana and Frey, there are seven important aspects of unstructured 
interview. These seven aspects provide a useful checklist of things to be done when planning the 
data collection by means of unstructured interview, The seven aspects are: accessing the setting, 
understanding the language and culture of respondents, deciding on how to present oneself, 
locating an intor mant, gaining trust, establishing good rapport and collecting empirical materials. 

How each aspect is handled varies with the nature of the situation and respondents. There 
needs to be flexibility in unstructured interview situation, especially for life-history projects. 
Douglas (1985) has called this creative interviewing. 

Analysis of unstructured interview is time consuming and difficult, Skill in this sort of 
interviewing does not come naturally and most of us need specific training to develop that skill. 
Here different types of information are collected from different people with different questions. 
This makes data orga nization very difficult. 

(iii) Group interview: |In group interview, the researcher works with several people 
simultaneously. Such an interview is also known as focus group interview. Originally focus group 
‘aterview was a particular type of group interview used in marketing and political research. 
Group interview can make very important contribution in social science researches. According 
to Morgan (1988), the hallmark of focus group interview is the explicit use ol the group 
interaction to produce such data and insight that would be less accessible without interaction 
found in a group, 

There are different types of group interviews and like other types of interviews, they can be 
unstructured, semi-structured of highly structured. Different types of group interviews have 
different purposes. Which type of group interview should be used in a particular research 
situation depends on the context and research purposes. _ 

The role of the interviewer OF researcher changes in a Rroup interview, functioning more as a 
moderator or facilitator and less as an interviewer. Here the process will not be putting question 
and answer in alternate way rather the researcher will be facilitating, moder ating, monitoring and 

cudnt eal ICIAteTACh) 1D | ction is directed by questions and topics supplied 
recording group interaction. The group intera 7 a Bel dona the grou 
by the researcher, This automatically means that particular skills are requires | 


interviewer (Fontana & Frey 1994). < iy ~ 
The data from group interviews are the transcripts (or other le of the | pou : 
Cale : . = * ee re | On C F 
interaction. They might be used as the only data-gathering technique in a study 1h con) 


tent erie oni e practical aspects of interviewing 
| ‘ews, there are some Pp ) : 
Whatever may be the types of interview: 


si Enel as under: 
that the interviewer has to keep in mind. Those practical aspects are 
li) Interview respondents 
(ii) Managing the interview 
(iii) Recording 


A brief discussion follows. 





a 


f Fy j id? il Be Clences 
if and Research Metbod: it Be hat 1 
Tes J FES a i if 
574 Test, Measureme 5 fy 


; pondents: There are some issues If interviewing that are relat - 

(i) Interview Fes ah vain issues are 
| i wed. Them ed 

respondents who are to be intervie h 


; hy? 

(a) Who are to be interviewed and a cael andi how tiiahiy WHS Will’ Gach 

(b) How many respondents will be In Person be 

interviewed? " 

(c) When and for what perio 

(d) At what place will each respondent be interviewer x 

(e) How will access to the interview situation nai ; ec 

If we pay attention to these five issues, It Is ee oh e Seep are related to the 
sampling plan for the research project, which itself depen ° airs rik aiid 7 UPON research 
questions and purposes. The next two issues are conceme | . sie location that have 
impact upon quality of data. Recognition of the pores! \ese two Issues enables the 
researcher or interviewer to take decisions that maximize the quality ot data in light of ethical 
responsibilities towards those being interviewed. The last issue 's concerned with Baining access, 
How it is done by the researcher depends on the particular research project and its setting and 
context. In fact, how the interviewers establish contact with the respondents and organize access 
affects the relationship between the interviewee and the interviewer at all stages. 


will each respondent be interviewed? 


(ii), Managing the interview: For managing interview successfully, the following five points 
must be kept in mind: 
(a) Preparation of the interview schedule 
(b) Establishing rapport at the beginning of the interview 
(c) Communication and listening skills 
(d) Sequence and types of questions to be asked 
(e) Closing the interview 
Experts are of the view that the importance of each of these points is determined by the 
nature and type of interview. For example, if the interview is a highly structured one, a schedule 
must be developed and pretested, because the quality of preparation will influence the quality of 
data. On the other hand, if the interview is an unstructured one, only a general awareness 
regarding the questions or topics is enough and would be kept deliberately open -ended and also, 
there would be no attempt to standardize it. In the case of unstructured interview, the 
communication skills in general and listening skills in particular are important one (Keats 1988; 
Minichiello et al. 1990). Minichiello et al. (1990) have reported 16 subskills of listening which 
they recommend a good interviewer must practice in order to improve their listening skills. 
Asking questions is also important and has been analyzed extensively in research methods 
literature. The analysis includes the way questions are asked, the wording of the questions and 
cause Re “ questions that are asked. Patton (1 990) has provided : 
may be classified ae a) inionib ' ‘gis according t6 the topics. According a pal; qt 
demographic hackproiine He h als Tee mowers ane ee | 
necessary al different stages du ieean’ sei sieohartouatiocin a ai ox "s0 
important. Minichiello al (19905 han vented Ss close! i an 
WiteriEW. ave pointed out several important strategies 10r closing 


mo | While planning about the qualitative research to be done by the 
, It Is essential to consider how interview data are going to be recorded. lf the 


interview is a highly structured ; 3 
| one, the rec | tah 3 it is 
here only making checkmarks € recording of responses stands simplified because 


particularly open-ended intervi oF 2 tepome sheet. However, for unstructured intervie 
tiling clearly exter & interview, possibilities of tape recording, video recording and/or n° 
St. Researches have shown that the situation dictates recording. If the interview 


Qualitatwe Research STS 
gee onducted in the field, tk 
5 SE agearcher Ail have te hie ties NOt exist good Opportunity for electronic recordin 
Here the sariaté method ot aay himself Satisfied by note taking. Moreover, possibilities i? 
using sant 1 aaa mene od of recording have to be assessed in relation to ‘ection constraints 
of the situ ation, Cooperation of the respondent and type of the interview e fad : ints 
: jiew selected, 
A review of the literature has revealed that researchers have used interview by adopti 
either of the two approaches or any one of th ila ea 


; : em. One approach is called as miner < 
the other 1s called traveller’s approach (Prakash 2011). In miner ee ie 


adalah ace teenie the trust and facts lying with the respondents. By means of structured or 
semi-structu red schedule, the interviewer tries to uncover the reality b Wy in as on = ci 
experiential world. In traveller's approach the interviewer itl ik he maior milestones 
covered during the process of the interview to delineate concepts, interpretations ws inferences 
Thus this approach comes nearer to what is known as constructionist perspective. 


indian psychologists, like western ones, have used interview to address problems of a wide 
variety in qualitative research. A few examples will illustrate this. Goyal (1997) conducted 
open-ended interview of middle-aged women in his study and reported that gynaecological 
difficulties were prominent and financial problems and daughter's marriage produced greater 
insecurity and loneliness among them. In a culturally changing village community (Khorwar 


tribe) Mishra (1997) used interview for investigating the role of family in health care and 
experience relating to health problems among women. His findings revealed that the major 
health problems comprised diarrhoea, child mortality, conjunctivitis, unwanted pregnancies and 


arthritis. Not only this, women were found to attribute physical, behavioural and supernatural 
causes for explaining their health problems. Vasuki and Reddy (1998) conducted open-ended 
interview of children in the age group of 9 to 15 years along with their parents. In this study, an 
attempt was made to describe parental perceptions of having a single child and children’s 
perception of being a single child. Results revealed that majority of the parents as well as children 
appreciated the states of having/being a single child. 

9. Observation: Observation has a long history in social sciences. For example, it has 
been extensively used by psychologists (Irwin 1980, Brandt 1981, Liebert 1995), by sociologists 
(Adler & Adler 1994) as well as by educational researchers (Foster 1996). When observation is 
used, it usually consists of detailed notation of events, behaviours and the contexts surrounding 
the events and observations. In observation practically all the senses such as seeing, hearing, 
feeling and smelling are integrated. Observation can be of physical environment, social 
interactions, nonverbal communications, planned and unplanned activities and interaction as 
well as of unobstrusive indicators. Observation Is often referred to as fieldwork because It 
commonly takes place in a field. 


In literature on observation as the technique of data collection, the terms quantitative anc 
qualitative are frequently used. Quantitative approach tends to be highly structured and require’ 
some predeveloped observation schedules, usually in detailed format. Here the researcher has te 
take a decision whether he will be using already existing observation schedules or he will bo 
developing an observation schedule for the purpose. 

Qualitative approach to observation 1s much more unetrucleres EEE the ageriedl - 
Not use any predetermined categories and classifications, rather | . crea - 0 is ae vi 
more natural and open-ended way. Irrespective of the technique : oT | ner ee 
events are observed as they naturally unfold. The rationale here is that ca = ah sie et P 
for describing and analyzing the observational data will tend to ee ent 2 
during analysis rather than be brought to or imposed UPOY — si ere ate “a First, U 

Patton (1990) has proposed five dimensions along eee ae nulls When the observ 
observer's rale may vary from being 4 full participant to 2 come tic ant observation, which 
tully participates while doing observation, it Is known as ‘oe ‘i  ccvadion covertly, 1 
commonly used in qualitative research. Second, the observer May 








‘ ¥ ee AT ay . ces 
cA i ef ei ' fi rey = 
7 & f i Jeasurem N f 


» of those being observed or yw); 
| 3 ith the knowleage OF INO : ith g 
is, from behind a one-way mirror OF wl ve of observation. Third, persons being observeq YS, 
bserved being unawa au false explanation regarding th May by 
of those being observe al explanation Of alse € ton. Fegan he py 
given full explanations OF part can also be provided. The fourth dimension 


| IS hrs. 

i i yolanation liege 7 
observation, Sametimes no © r for a year OF could be as brief as an hour. The fifp and 
The observation may continue 1 . 


_ ' ; nd 
; being quite. en 
‘on is breadth of focus that is, observation may vary from be ; quite broad to quite rags 
dimension is brea + anciiGeaved observation typically evolves through a series one 
The process of t 7 
activities. It begins with selecting 4 


of 
ini to | | fern 
setting and gaining access 1 II and’ then, starting 
aaervation and redding it as the study proceeds ahead, the nature of observation cha” 
observation and reagin, " | 
typically sharpening In focus an 


ha 


d leading to some clear-cut research questions that , n fs, 
selected observations. sale (1993) has proposed live iil ail TBANIZing ny 
' ) ) 
ge nc sein vote some general research questions are proposed 

(b) Writing field notes 

(c) Looking as well as listening 

(d) Testing hypotheses 

(e) Making broader links 7 aoe ; ort 

Although observation may be nonparticipant oF ee 7 ai research, 

participant observation is more common. Participant ouservation i an pe was erstood AS d Process 
in two respects. First, the researcher must become gradually a participant and Rain access to the 
individuals. Second, the observation must gradually move through a process that CONCentrates 
on the aspects of the research that are essential for providing solution to the research questions 
Spradley (1980) has distinguished three phases of participant observation as under: 

(a) Descriptive observation: This is the beginning phase ot the observation where the 
researcher tries to develop an orientation to the field under study. This is used to grasp 
the complexity of the field and, at the same time, develop more concrete research 
questions and lines of vision. 

(b) Focused observation: This is the secand phase of the participant observation that 
narrows the perspective of the research on those processes and problems, which ate 
considered relevant for the research questions. 

(c) Selective observation: This phase hints towards the end of data collection and is 
focused on finding further evidence and examples for the types of practices commonly 
found in the second step. 

There are some limitations of participant observations as used as a technique of dala 
collection in qualitative researches. One limitation is that not all yhenomena can be easily 
observed in situations, For example, biographical processes are difficult to be observed. In fad 
events or practices that seldom occur (and considered crucial to the research question) can be 


ace oily by luck or only by a careful selection of the situations of observation. Still another 
— a me elas observation can hardly be standardized beyond a gene 
arcn strategy and it does not make sence lO S€ pe yee Jogica 
» SENS ee this as a go; | nethodolog 
developments (Luders 2004). S a goal for further | 


___ Despite all these limitations, observation has be 
issues, For example, observational data have bee 
services. One study has demonstrated that the s 
communities in villages experienced disc 7 
level (Babu et al, 2000). Likewise, Kar et 
patients in which several behaviou 
the researchers to conclude th 
show socially inapprop 


en used in the Indian context for a variety 0 
N used to understand the quality of hea 
chedule caste (SC) and schedule tribe 
rmination in health services provided ai the househor 
al. (2000) conducted an observation study of dement 


: “i problems were revealed. The observations of caregivers | : 
ci ch ® ; 
riate be Patients were aggressive, verbally abusive and us 

ate behaviour more often. 





Qualitative Research S77 

oe it om man eee tate whether participant or nonparticipant, is often 

SI aortic neh any researchers prefer to use observation as a technique 
of data collection along with other methods to complem 


| : ent the major research activities. 
3. Document review: Documents, both historical and contemporary, are a rich source of 


data for social research. Documents may be understood as standardized artifacts available in 
particular formats such as notes, case reports, drafts, contracts, remark, diaries, annual reports, 
certificates, judgements, letters or expert opinions, biographies, autobiographies, institutional 
memoranda, reports and government pronouncements and proceedings (Jupp 1996; Wolff 
9004). Researchers have classified this bewildering variety. One of the most popular 
classification of documents has been done by Scott (1990), He has distinguished 12 types of 
documents, which are constituted by a combination of two dimensions: authorship (who 
produced the document and access to the documents. The authorship refers to the origin of the 
document (in three Categories: personal, official-private, official-state) whereas access refers to 
the availability of documents to the individuals other than the authors (in four categories: closed, 
restricted, open-archival and open-published ). This resulted in3 x 4 = 12 categories of documents. 
For assessing the quality of documents, Scott has suggested four criteria which can be used for 
deciding whether or not a particular document or set of documents can be useful for the research. 

(a) Authenticity: |s the document genuine and of unquestionable origin? 

(b) Credibility: |s the document free from the error? Has it not been distorted? 

ic) Representativeness: Is the document typical of its kind? If not, is the extent of its 

untypicality Known?! 
(ld) Meaning: |s the document clear and comprehensible? 


There are some practicalities in using documents as the source of data collection. One useful 
suggestion is that the researcher should not start from a notion of factual reality in the documents 
and compare it to other sources of data collection, Since documents represent a specific version 
of realities constructed for specific purposes, they should be seen as a way of conceptualizing 
information rather than as ‘information containers’, Another suggestion is that no part of the 
document should be taken as arbitrary and the researcher should start from the 
ethnomethodological assumptions of order at all points, This must also include the way a 
document is set up (Flick 2009). 

In the Indian context, documents have been frequently used for conducting research in the 
field of leadership using documentary text. Sharma (1995) identified leadership issues as 
reflected in Indian thought depicted in the Ramayana, the Bhagavad-Gita and Dharmashastra in 
general. It was observed that the leader has to ensure that group members are committed to 
organizational objectives and has to coordinate the activities of group members. Finally, the 
leadership style is determined by the leader’s own values, confidence in subordinates as well as 
management of uncertainty. 

There are some limitations in using documents as source of data collection. Sometimes the 
necessary documents are not available, not accessible or stand lost. In some other ae abe 
may be people who can block access to the documents the researcher nex re other ee ~~ 
associated with document analysis may be that sometimes the researc i haar aoe 
understanding the content of the documents because he can’t decipher the Wor ane 
codes or reference that have been used because they are difficult to be rea . - — 

Besides these techniques of data collection, there are also aN aes data 7 it ectlan 
used in qualitative research. According to eal! a Ce alvarietcaicl boli com be 
techniques like questionnaire, surveys, projective ane. scar icc lead at pose ie’s sal 
used either in quantitative or qualitative data. Besides dy of ie movements) are also used as 
Space and its relationship to culture) and rfnesees woes Iso been used as qualitative data 
qualitative data collection techniques. Narralives nae = 





hos 1 pebavioural Scienec 
fy Metin* 
nm - 


ons and Resea 


; AMexst e's individual life : 
578 Test. 5 study the | ple’s ind fe stories, | 


sae u : or images and more | 

ion technique- Narratives 3° ce of W ds, acvor? le research a nN 
collection -o¢ told bY 2 sequence". Narrative, thus, allow researchers to an.’ 
narratives are stor ahi those storie. ° ctured world in a comprehensive eum 


, ~ hin 
Fenformation WHA yet stu ' 
"s i ‘ere sta 7. 
< {interviewee s) er ned (that is. how things Were med), su * hy 


Sy os : i riences a | 

cues first the initial StUA"™ the entire set OF EXPENEN and presenteg .. 

narratives lly th %, 

he narratives are se developed ) and finally the situation at a 

events relevant 10 fevents (that is, how thing? a) (Hermanns 1995) the tg 
n orev Be me! \ ; ; 

coherent progression OE ed ithat is, what DEC? -@ interview and episodic j 

of the development != _. ery commo n—narrative interview an EPIsOGic interview 

ry's 5 asked to present the history of an ae._ 


VES ods are V ao ee 
In narratives, }O meth “mant of interviewee i atop inet : ¢ 
tell the story of the interest in a4 


the narrative interview, the into Sth «aterviewee 

-nterest. The interviewer 5 task is to ma . beginning to end. In episodic interview a, 

=e i of all relevant events trom z 2 ‘eodes) Th . .- te v the 

ap ca ) are combined with narratives (ot epise om Thus episodic intervie, 

-<answef Ss vwences = : ij since tnese are v 

old eee? presentations in the form of narrative * ne on ery Close 10 the 
yie : -e< and their generative context than other presentatio Lfonte: 2 

: Th ve find that there are different methods of data comma ae jon: research. Foy 

iiss | : in the study. B ut 

better results, researchers generally use More than one metne a Sa a selecting a 

lementary method, a researcher can cover the weaknesses 0 n thod wi strengths 6 

mee i ¥ ood sualitative researcher often tries to include multiple methods of dap 

f. us a 4 : qt - F ' a P ae 

; ; : se technically known as triangulation 

collection. The use of multiple data collection techniques !© tec ¥ angulation, 


which permits verification and validation of qualitative data. 


question 





el. 


DATA ANALYSIS AND INTERPRETATION 
The term ‘data analysis’ itself has different meanings among qualitative researchers. Date 
analysis, in general, means examining, sorting, categorizing, comparing, sy iz 7 
evaluating and contemplating the coded data as well as reviewing the raw and reviewer! aa 
Since qualitative researchers mainly concentrate upon the study of social lite in natural settings, 
and there are different ways of looking at and analysing social life, there are multiple pers c ive 
and practices of analyzing qualitative data. Despite this variety, some experts have tried to 
identify the common features of qualitative data analysis. For example, Miles and emai 
(1994) have suggested that there are six moves Common across different types of analysis. Ti 















have labelled their approach as transcendental realism. Tesch (1990) has pointed out that nt 


no characteristics are common to all types of analysis, there are NO fewer than twenty-six 


approaches to the analysis of qualitative data. Likewise, Patton (2002) has also mentioned 2 


number of sources for data analysis and interpretation. The use of computer software 
qualitative data analysis is also becoming very popular. 


Here an attempt would be made to discuss some specific techniques of qualitative = 
analysis. Some specific techniques of data analysis are as under: 

1. Miles and Huberman’s technique 
. Grounded theory analysis 


. Narrative and Hermeneutic analysis 


& i ho 


. Ethnormnethodology and conversation analysis 
Discourse analysis 
. Documentary and textual analysis 


Fy i | i " 1 
4 } | ij = 1 i 


om A 


Qualitatine Research 5T9 


1. Miles and Huberman’s technique: Qualitatiy | 
<— ; € data analysis by Miles and Huberman '5 

‘ comprehensive system that is directed at tracing out lawiul and mitie relationships among 

octal phenomena. Their analysis has three main components " | 


(a) Data reduction 
(b) Data display 
ic) Drawing and verifying conclusion 


(a) Data reduction: Data reduction occurs throughout the analysis and it usually takes 
place in three stages. In the beginning stage, it occurs through editing, segmenting as well as 
summarizing the data. In the middle stage, it occurs through coding and memorizing as well as 
through its associated features such as finding themes, clusters and patterns. In the last and final 
stage, it proceeds through conceptualizing and explaining because developing any sixtract 
concept is also one way of reducing data. In both qualitative as well as quantitative research, the 
objective of data reduction is to reduce the obtained raw data without significant loss of 
information. In qualitative analysis, an additional precaution regarding not losing information ts 
not to strip the data from their context. 

ib) Data display: There are different ways of displaying data. Graphs, charts, networks, 
diagrams such as Venn diagrams, causal model, etc., are the common modes of data display. 
Miles and Huberman consider data display important and essential at all stages of analysis 
because these displays enable data to be organized and summarized as well as they also show 
what distance the analysis has covered so far. Thus any good qualitative analysis involves 
repeated and interactive displays of data. 

(c) Drawing and verifying conclusion: The major purpose of qualitative research is to 
draw conclusion from the data. In fact, the reasons for reducing and displaying data are to assist 
in drawing conclusion. Although drawing conclusion logically follows data reduction and data 
display, it in fact takes place at each stage of analysis. The researcher notes down the possible 
conclusions early in the analysis, but since they are vague and ill-formed. they are held tentative 
pending further work and sometimes sharpened during it. They are not formalized until all the 
data have been analysed, conclusions are obtained in form of propositions and once they are 
formed, they also need to be verified. 

These three components are interwoven and interactive throughout the analysis. The above 
three components such as data reduction, data display and drawing conclusion present an 
overall view of data analysis and involve three main operations: coding, memorizing and 
developing propositions. Coding and memorizing are Closely related and go together. 

2. Grounded theory analysis: Previously we have defined grounded theory as both an 
overall approach to research as well as a set procedure for developing theory through analysis ot 
data. Here grounded theory will be discussed only as a procedure for developing theory through 
analysis of data. The basic aim of grounded theory analysis is to generate abstract theory to 
explain what is contained in data. All oi its procedures are oriented towards this aim. 

The essential theme of grounded theory analysis is to find a core category at 4 high level of 
abstraction but grounded in data. Grounded theory analysis does this function in three steps. The 
first step is to locate conceptual categories In data and this is done at the first lev el of abstraction. 
At the second step, the researcher tries to find relationship among these categories. At the third 
stage, he conceptualizes and accounts for this relationship at higher level of aberaction. 
Obviously this means that there are three general types O! codes: substantive codes, theoretica 
codes and core codes, Substantive codes are the initial conceptual categories In the data: 
theoretical codes connect these categories: and core codes are aa jie ee 
conceptualization of the theoretical coding around which theory 15 to be built up. Accorainegt!y 


there are three objectives of grounded theory analysis. 


wa 





+» Behavioural Sciences 
his Ul eee 
ors and Resource si h + 
: esi, Mexusurente . 4 he data. Su stantive Codes 
aii and substantive codes int t level. In the first | = Catega, 
objective is t0 1M ; data but at a more abstrac a > eVe| of ane ie 
ion ni ET tral in the data tha Alves 
(a) rated trom the ene s appear as More central int Nothers "hs 
, stantive COURS @ 
these substan 
some of thes 


‘ve is to find out relationship among the substantive COdes 
second objective 18" 
tb) The secon 


: struct (that j | 

ienctnal t= sto find a high order, more ian ae een B0ny 

(c) The third objectives spotheses into theory an wie ® and’ explains the” 

which ccna ie objective is achieved is called aes ome the ae 

ine Wis! ird objective is achieved js | 

iaved is lied axial coding and os oe they may be overeat CClive 

objective ci ot be done sequentially and t ee PPINE OF dong 

siete ndings is put forward. 

coding. These may NO" , of these codings Is put | a 

concurrently. A discussion of iy first level of conceptual analysis. Here analysis being by 
Open coding is done al a Here the researcher examines the data to co 

fracturing or breaking open - ca or codes. The open coding necessarily 


4, 


Wing 


The process by whi 


INVOlVes 4 

me preliminary analytic ca : ifying the concept or conce tual 
L aaaty f data (or some of the data), identitying | his. xcs L P “Ale gorieg 
hh eri "4 and theoretical possibilities that are carried by data, As we now, the conc 
inherent in da Nd wh 


urpose of open coding is to use the data to 
are the basic building - * ieee building. In simple words, Sper cae 
abstract conceptual categorie te conceptual labels and categories for use in theory building 
Ae oe pate coding is best done in a smal! group. This helps keep . 
sonia analysis on track as well as ensures me sens ed - 

The process of labelling is guided by two main activities—ma Ng ve es: 
questions. By making comparison is meant that the different pieces ol data are 
compared with each other for generating abstract categories. Similar ever aie) 
labelled and grouped to form categories, Asking que stion means one type of question in three 
forms is constantly asked. First, what is illustrated by this piece of data’ or second, what does this 
piece of data represent? or third, what category or property of category is indicated by this piece 
of data? These two activities, that is, making comparisons and asking questions, focus directly on 
abstracting and raising conceptual level in the data. 

In grounded theory labelling, in open code 
concept-indicator model (Glaser 1978), According to 
different possible empirical indicators. When the researc 
the data, he, in fact, is abstracting—going upwards fro 
abstract concept. Since a concept has many indicators, 
each other for the purposes of inferring the concept. I 
concept and indicator 2, indicator 3, etc., are also 

~Oompares one indicator with the other indicator, 
inferring the concept. The researcher also asks ¢ 
Empirical data, 

Axial (or theoretica 
Main categories that ha 


and asking 
Constantly 
nts and incidents are 


is also guided by what is called 
this model, a concept can have many 
her infers a concept from an indicator in 
m a piece of empirical data to a more 
the indicators are interchangeable with 
means that indicator 1 is the indicator of 
the indicators of the concept. The researcher 
assesses their similarities and differences for 
onstantly what is indicated by this piece of 


!) coding is the second 


: stage in the grounded theory analysis. Here the 
other. Thus, it is a com da ran Open coding of the data are interconnected with each 
because the Piex Process of inductive and deductive t 


fesearcher here orpanize i 
Categories. The word ‘ax Banizes the codes, links the 


Borles. The lal’ has been used by St 
Putting an ‘axis through the data. This axis, : as 


Open coding, Howey 7 
describing tis stage, |S" (1978) has used a m 


How is the Ww 

ork of 
Concepts that tend to aie “ 10 explain this interrelating work, sail 
“ONCEpts. That is the reason wh Sneeded. The “onnecting concepts are called theoretical 
Y Glaser has Called jt as theoretical coding. Strauss and Corbin 


hinking involving several steps 
m and discovers key analytic 
and Corbin and it indicates the idea of 
sumed, Connects Categories identified in 
ore general term ‘theoretical coding’ fot 


Qualitative Research $81 


97) have suggested a coding Paradigm mode! to describe 
(19 nections between things. This model is b 
con 


the set of concepts used for making 





interactional strategies 


Figure 22.1 Coding paradigm model 


The concepts included in each category may become (a phenomenon for this category 
d/or (b) the context or conditions for other categories or a third group of categories, (c) a 
conse uence. Thus the model names the possible relations between phenomena and the 
pt ts and is used to facilitate the structure of relations between categories, between 
phenomena ang between concepts, The developed relations and essential categories are 
eatedly verified against the data and the related text. The researcher has to move continuously 
back esKLIOAry ie yween iiRbiative thinking developing Categories, concepts relations from the 
et etc.) and deductive thinking (testing the concepts, categories and relations against text), 
especially against those passages or cases that are different, 

From the different categories that were Originated, only those are selected 
most important and promising for lurther elaboration. These axial 
with as many passages Or Cases as possible. 7 

Selective coding is the third stage in grounded theory analysis. This stage continues the axial 
coding at a higher level of abstraction. The word ‘selective’ is used here because the researcher, 
for this stage, deliberately selects one aspect as core category or concept and concentrates upon 
it. At this stage, analysis begins around the core cate 


gory and the core category becomes the 

central piece of the grounded theory. In a nutshell, in selective coding the researcher examines 
previous Codes to identify and select data that tend to support the conceptual coding tl 
developed earlier. In selective coding, the primary objective is to integrate and pull together the 
developing analysis. The theory to be developed must have a central focus and this ea 
core category of the theory. It must be a central theme of the data and should also be perceive as 
central by the participants whose behaviour is being studied. In a nutshell, selective coding Is 
thus aimed at developing the abstract, condensed and integrated pictu re of the data. For Strauss 
and Corbin, the important concepts here are core category and story line. The core category is 
defined as the central phenomenon around which other categories are integrated and story line 
means a descriptive narrative about this core category. When the storyline is analyzed, it 
becomes core category, In other words, in such analysis the researcher moves froma description 
of the storyline to a conceptualization of the story line. It means telling the story iiss 
descriptively and it also means finding a conceptual category to encompass what has ; en 
described in the story, Finally, theory is formulated in detail and again checked oe 
The procedure of Interpreting data ends at the point where theoretical saturation has been 
arrived. Theoretical saturation means a stage where further coding, enrichment ol categories, 
elc., no longer provide or promise any new knowledge. At the same time, Wl perce is 
flexible in the sense that the researcher can re-enter the same text and same code a ire 
coding with a different research question and can aim at formulating a grounded t pal se 

iferent issue. Thus this stage shows those categories where further blah requ ane 
therefore, directs further theoretical sampling. In the language of grounded theory, this stage 
Called as the systematic densification and saturation of theory. 


that seem to be 
categories are considered fit 





Y ipf iy 
fy mona Hat: 
t yetliow es pon dt Paar er 
i ite rh F i 
af ghia Ai oi 
pers 


Pie Mewes! 


que! 


wmalys!s approaches based on Coding 


- Pata = 
uth analysis: jal ; 


ts to tind and conceptualize regularities jn data 
es yinto small pieces and also generate the risk a 
(Atkinson 1992), In doing this, such APPrOAChEs 
) Huberman have considered this problem as 4 Serious 

ihe data Miles and Pi cel recomentualizing the data. Narratives and 
x ie ‘esienlly with qualitative data. Many qualitative 

ve oral and life histories, and biographical 


sand! Jerrrrnenr ie 


y Narralive aluable in 


yan are ho daub * ae 
i hinitatien) je thal the} 
ae | al’ 


caecones 
i | re 
y culture of frage nlatiol 


but the gem 
ypc ol 


the devel : | 
yl ches ole stud ioe eri 
| il cunuestod Wil 

Mt 


_ and la jst! hey deal vee 


hh ways deed { in case ol 


ures cree aH as wet fund i | ; : Meri ast 
da 1 be collected 19 Sl0ry eld {fully soli ited in story form, they may Come with Stories 
ala «il an are TOL TLINY - , caren ro) ee 
data « sani ENGR where the cata are aie narrative FESpONses le the questions pul lorward 
progerry ee o HAY | 


cipants t idered impe Ins inp fi 
acl i stories have also been considered important in studying lives 
vty ELS s ) rs ath 


1 the way people organize their everyday practices ang 
rally appear In oral or written texts Tor expressing the 
«and they generally appes nacific form by which pe 
: ) pxpenences, arid | ‘ oaaneance and aspeciic form by WHICH people 
res it fact, itisa quality ol lived experience al hatever enn ‘ P Not 
uncdertene Te ; 4 4 antities but also locate themselves nm Whale ring fOuUnd 
wily construct thei lene ; ar AY mart 
; Ives, both at micro and macro levels. Any 1 after condition) interrelating withi 
iascigr aR ; f of process (that is, before and aller cot MV, é IE WITAIN a 
1 i iy wl i if » 4 ari 7 : = i : ic ge 
nse. moven Jividual or collectivity that engages In a tion, coherence (or the 
: ciaker) anil incing of the chain of events (Neuman 2006), 
whole that halcds lowe ; t ‘ mas sulstick stories have been used as a way to caplure the lived 
Wwe , IT ny reseane yew v ini i i ick c | = K ] " F LF 
eee 1 cample, Brody (1987) and Coles (1989) have used stories for capturing lived 
mcuahalacnetsbene's dic il iliness; Riessman (199 3) used this in studying major life events 
ia | i Te ately = . 
ae used this | ies of education from students’ poj 
nd trauma: and Delamont (1989, 1990) has used this in atugies ps oe en 
rhea ri , : é narratives, in fact, try to give a uniquely rich understanding of life situations 
7 wew. Ws y ve lle *f shell OL yee : sin tie ae ce = 
; , considered a feasible way of collecting data simply because il is Considered 


ceva trsties where the 


ml carranliwe 
by the imernviewers Marraal ape 
yw lived exper rer t", Narratives 


a tale KE 


ved it 
{ content, an involves ! , 
oo ta ymporal seque 


experiences IN Case orm 


and the story 1s offen 
as a very common device in social interaction, | . . _ 
Now the question. ts: how can qualitative data obtained rom Tarralives ang Stories he 
analyzed? There are different suggestions from different experts. Ac cording to Schutze (1983), the 
first step in analyzing narrative interview is to elimin ate all nonnarrative pEsene from the text 
and then to segment only purified narrative text for further analysis. At the second step, attempt ts 
made to make a structural description of the contents where different parts of the narratives are 
specified. At the third step, the researcher moves away from the specific details of the life 
segments. Finally, the case analyses obtained in this way are compared and contrasted to each 
other, Bude (1984) has pointed out a different view by suggesting the reconstruction of life 
constructions. His viewpoint is that narratives take into account subjective and social 
constructions in what is presented, that is, life constructions in narrative interview, for example, 
Rosenthal and Fischer-Rosenthal (2004) have analyzed narrative interviews in five ditferent steps, 
(a) Analysis of biographical data (that is, data of events) 
(b) Thematic field analysis (that is, reconstruction of the life story) 
ic) Reconstruction of the life history (live as lived) 
id) Microanalysis of individual text segments 
te) Contrasting Comparison of life history and life story 


sol aa an mand content can be ued other, Naatives can anal 
moves toward vecciviieettaae re . meanings and experiences, Here Narrative analysis 
language is used figuratively, C i . 15 Stoel 994), It can be examined, for example, how 
Participant's Way of using ie Bid ah Atkinson (1996) have shown how analysis can explore 
and understandings. The gen any ei how such devices as metaphors reveal shared meanings 
cultural meanings is eal, ‘ =Xploration of using linguistic symbols for manish iveatoite 
as domain analysis (Coffey & Atkinson 1996), Metaphors wsed in 





ative shares six core elements: telling a story 


> ae 


Ouelitative fewarch 583 


at interaction for conveying meaning are also analyzed by qualitative analysts. Miles and 
gocld man have pointed out towards some useful properties of metaphors in qualitative analys!s. 
Hee nates metaphors are used as data-reducing devices, decentering devices and ways of 
For Sai findings to theory. 
on smeneutics is the study of interpretation of text or events in the humanities. 
3 utical interpretation seeks to arrive at valid interpretations of the meaning of a text. This 
wach makes a distinction between (a) the subjective meaning that a statement or activity has 
a aproe irticipant and (b) its objective meaning that can be understood by using the concept of 
fot te nce of meaning and this structure can be examined by using framework of multi-step 
jaten" ‘ic procedure of interpretation. That is the reason why it is also called as objective 
sc aICR Oevermann et al. (1979) have pointed out eight different levels of interpretation in 
apjective hermeneutics. | | | | 
(a) Paraphrasing the meaning of interactions according to verbatim text reported by the 
participants 
(b) Explication of the interacting participant's intention 


Hermene 


(c) Explication of objective motives of the interaction as well as of its objective consequence 

(d) Explication of the function of the interaction for determining interactional roles 

(e) Characterization of the linguistic features of the interaction 

(f) Explication of the interpreted interaction for determining communicative figures 

(p) Explication of general relations 

ih) Independent test of the several hypotheses formulated at the preceding levels 

There are three primary tools of narrative analysis: path dependency, periodization and 
historical contingency. 

Path dependency refers to a process as having a beginning that triggers a structured 
sequence of events and create a path that is followed in the chain of subsequent events. The path 
limits the direction of the ongoing event that follows. In explanation based upon path 
dependency, the outcome Is highly sensitive to events that occurred early in the process. Here va 
researcher starts with an outcome. He tries to demonstrate how the outcome follows from a 
sequence of prior events. As he traces back and displays the effect of each event upon the other, 
the researcher goes backward in the process to the initial event. Researchers using path 
dependency explanation assume that there may be one explanation for the starting event and 
another for the path of subsequent events. 


There are two types of path dependency: se/Freinforcing and reactive sequence. |n 
self-reinforcing path dependency explanation the researcher looks at how an event once started, 
continues to operate at their own or tends to propel later events in a direction that resists external 
factors. Thus the initial trigger event places limits or constrains the direction of the process. The 
reactive sequence path dependency emphasizes upon a different process. Here the researcher 
focuses upon how each event responds to an immediately preceding one rather than upon 
tracing a process back to its origin. Thus here, the researcher very carefully examines each step in 
a process to see how one step influences the next one. Here the path does not necessarily have to 
be linear or unidirectional, rather it can bend or even sometimes reverse Course to negate its 
previous direction. Thus here, the sequence of events can be like a pendulum, which swings 
back and forth. Any single event may set into motion a reaction that changes or may also reverse 
the direction of the events that preceded it. 

Periodization is another important tool of narrative only where the researcher divides the 
flow of time in social reality into segment or periods. Thus here the researcher may for example, 
divide 150 years of history into periods by breaking continuous time in discrete units or periods, 
and define the periods theoretically. Theory helps the researcher in identifying what is significant 
and what is common within periods or between different periods. 


— 


——— 








joural Sciences 


search Methods In Bebai 


gi fest Measuremens and Re 
> me . es : 
‘agency, the er makes a unique ln * Particular facto : 
eee ngency, ; | nd place, r wanes: Q 

inhistorical cr tha came together in a particular Rh se nals contingent Situatio, 

: : F | % ; 

specitic eens bit once it occurs, it can profound y influe i ent events. Si 
may be unexpecte | ‘ble idiosyncratic combinations of events, a researc er uses th 
there may be many possible for an explanation. Most of the time, rese 


ifying | ant contingent events : | : 
eenbine et contingen and path dependency for the better results of the ana lysis, 
combine historical CON": 


: methodology and conversation analysts: Ethnamethagology is defined as . 

4, Ethno . interested in analyzing the methods the individuals use mae 
theoretical approach : inieatiow and routine work effective. It addresses the quest; oh 
everyday life to make pape reality in and through interactive processes (Garfinke| 1969), 
how individuals produce ie % understand ‘folk’ (ethno) methods (methodolo | -|n 
fact, ethnomethodology id isilverttiin 1993). The basic assumption of ethnomethodology i | Me 
organizing the cele procedures for making sense of their daily life. The rine: 
cece is how the central features of a Cu Iture, its shared eC and social Norms are 
developed, maintained and changed rather than upan the cael ‘ ie meaning (Feldman, 
1995), This focus forces ethnomethodologists to study aENYINES : " : e wepabiead J people engage 
in often without proper thinking. For better understanding of pees ection and interaction, 
language becomes central to the day-to-day activities. Thus social life 's mediated through 
written and spoken communication where the study of language lies at the heart OF 
ethnomethodology. In this way, conversation analysis becomes of central importance a6 
ethnomethodologists seek to understand and interpret individual s methods for Producin 
orderly social interaction. Heath and Luff (1996) have referred to a 1990 bibliography ot 
ethnomethodological/ conversation-analytic studies that contain more than 1400 citations jn five 
different languages. The general purpose of such studies Is to understand the naturally occurring 
human conduct in which talk is the primary vehicle for production of human actions. Where only 
talk is analysed, verbatim transcripts of actual conversations are used. 

Following Silverman (1993) the three fundamental assumptions of conversation analysis are: 
structural organization of talk, sequential organization of tale and the need for empirical 
grounding of the analysis. Based upon these assumptions, conversation analysis studies the 
situated production and organization of talk, developing an understanding of how context 
influences the participants’ production of social reality. 


research 


archer. 


Ten Have (1999) has outlined the following four steps involved in conversation analysis: 

la) Making recordings of natural interaction 

(b) Transcribing the tapes either in whole or in part 

ic) Analyzing the selected episodes 

(d) Reporting the research 

A frequent starting point for conversation analysis is to 
conversations are opened and what . 
conversations in ordered way. There 
interpretation. First, conversation dnalyti 
iT ensures that no later statements or inte 


‘io inquire into how certain 
linguistic practices are applied for ending these 
are two essential features of conversation analytic 

C Interpretation ts strictly a sequential procedure, that is, 

We ln iumwoket factions are consulted tor explaining 4 Certain sequence, 

Seton wenn ee ie “I If conversation is classified by conversation analysis. 
clots in auch Na : . Seep emphasizes upon the context. This means that 
prise rien bone 8 order in Conversation can be analyzed only as related to the 
si enh finey are embedded in the interaction and in which interaction again 

9, Discourse analysis: 


Discour Si 
Te ag , Se an 
esearch. it includes not onl alysis 


is an important develo + 
eg Z pment in qualitative 
Y €veryday conversations but also other sorts of as such as 


> ae 


Qualitative Research 585 


interviews OF media mae In fact, the word discourse captures this broader focus and refers to 
he general perspective within which ideas are formulated (Sapsford & Abbott 1996). Discourse 
r nalysis ig very much sensitive to how spoken and written languages are used and how accounts 

tions are constituted. At a microlevel, its elements share much with conversation 


descrip 

and ae re, some experts ion ; a 

analysis and therefore, Per's see Conversation analysis as a particular type of discourse 
analysis emphasizes upon the 


nalysis (McCarthy 1991). At a macrolevel, the discourse 
“aterrelationships between accounts and hierarchies, power and ideology. Discourse analysis is 
not an unified body of theory or methods, rather it is conducted within various disciplines with 
no overarching unifying theory common to all types. (Gee et al. 1992), | 


Despite this diversity and several disciplinary perspectives, there are some fundamental 
inciples of discourse analysis. 


(a) Human discourse is norm-governed and internally structured. 

(b) Itis produced by speakers whose cultural, economic, social and personal realities shape 
the discourse. 

(c) Discourse itselt contains important aspects of that socio-historical matrix. 

Thus discourse analysis is concerned with any part of human experience constituted by the 

discourse. 

At a mare general level, Jupp (1996) has identified three features of discourse analysis as 
under: 

(a) Discourse is a social process which indicates that words, sentences, etc., and their 
meanings depend on who used them where, why and to whom. Consequently, their 
meanings Can vary according to social and institutional settings. 

(b) There can be different types of discourse, which may be in conflict with each other. 

(c) Besides being in conflict, discourse may be viewed as being arranged in hierarchy. 


At a more specific level, Potter and Wetherell (1994) have also identified three features of 
discourse analysis that makes it relevant to the qualitative research. 


pr 


(a) Discourse analysis is mainly concerned with talk and texts as social practices and 
therefore, it pays more attention to what has been called as linguistic content such as 


topics and meanings as well as attending to the features of linguistic form such as 
grammar and cohesion. 


b) Discourse analysis has shown concern with action, construction and variability. 
Individuals perform actions of different kinds through their talk and writing and they 
accomplish the nature of these actions partly by constructing their discourse out of a 
varieties of styles, linguistic resources and rhetorical techniques and devices. 


(ic) Discourse analysis iS CONcerned with rhetorical or argumentative organization ot talk 
and texts. 


Potter and Wetherell (1994) have pointed out that it is very difficult to describe explicit 
procedures that are used in discourse analysis but they have identified five important 
considerations that are important. These are: variations as level, reading the details, looking for 
rhetorical organization, looking for accountability and cross-referencing discourse studies. There 
has been some differentiations in discourse analysis in the last four years. For example, Parker 
(2004) has developed a model of critical discourse analysis based upon the background 
developed by Foucault (1980). That is why this analysis is also known as Foucauldian Discourse 
Analysis (Flick 2009), Parker has suggested the following seven steps in such an analysis: 

(a) The researcher should transform the text to be analyzed into written form, if it is not 

already done. 





 paheaeicneral SCRNCES 
| — jn Bebavi 
rcp Merboe. 


p and Resedt 


oct Mfoastarermien sigs 
586 eh te free associall 3 
her sho jd include be carefully noted down. 
(b) The researe rks and thes€ should F se the objects in the text or in th 
outa hould systematically "em a * Selecteg 
: “ch cher 5 : 
(c) The reseat . j text; 
portion of the al maintain a distance from the text by treating the text itsejy aS the 
her shot 3 
id) The researe’™ te study. erates 
rimary object of the study ramjze the subjects that is, characters, role Positions 
i rext should cystematically sal i 
(fe) ine : i 
in the text. econstruct the presupposed rights and responsibilities of the 


(f) The researcher should f 


subjects. 


(g) The resear 
located int 


rwork of relationships into patterns, which Cn be 
he relations of ideology, power and institutions. 
and textual analysis: Analysis of various documents and related text jg 
ro ese and textual analysis. Such analysis shares characteristics with 
s its own distinctive themes. 


, as documentary : 
.scourse analysis. But il ha 
al production of the document and text, starting with 


; nt theme focuses On soc! : : <a 
ea hacen e being. All documentary sources afe the result of human activity and are 
3 |, historical or administrative settin: 


how these came ae a 
always located within the constraints of particular socia : 
ys The words used and their meanings depend on where they are used al 5 


Ue i agwoamet and texts studied out of their social ada a ae ally deprived of 
their real meaning. Obviously, ii means that an understanding ol amas niiaadai and context of 

she document and texts tend to have a strong impact upon its interpr eration. 
4 second important and related theme is the social organization a z ocument. Silverman 
(1993) has raised various questions relating ta the study of social Open EaNOn OF documents 
uch important questions are: How are documents 


irrespective of some errors involved therein. 5 
written? For what purposes? On whal occasions? Who writes them? What is recorded? What js 
know for making sense out of thems, etc. 


cher should map 4 ne 


know : 
con VErd fiona nd d 


omitted? What do readers need to These questions tend 
to provide hints towards how documents and texts have been socially organized. Silverman has 
applied textual analysis to files, sta tistical records as well as records of official proceedings and 
images. 

A third theme is concerned with a direct analysis of text and documents tor meaning. Such 
analysis may focus on the surface meaning (or literal meaning), deeper meaning as well as upon 
multilayered nature of meaning, Historians often show concern with surface meaning whereas 
sociologists and psychologists are more interested in ways of uncovering deeper meaning, 
Methods used range from interpretive understanding to some structural approaches, 

A fourth theme is the application of different theoretical perspectives to documentary and 
vi analysis. According to Silverman, there are many ways of thinking about textual and 
documentary analysis and there are many different theoretical perspectives that can be applied 
Lise of computers in the analysis of qualitati | — | 
pee: outers in the analysis of qualitative data: (Qualitative research is undergoin 
China ogical change. A range of software proerar ; Ti ie g 
of seualitcave ape? cake soft programmes are available. mostly focused on the area 
: inalysis. The : se 
(Qualitative Data Analysis) soft, lerefore, these programmes are sometimes called as QDA 
ANAIVSIS! § ‘ tf 
Software). The various types of tii — ‘Computer-Aided Qualitative Data Analysis 
y PUPTaMMes Or softwares availa ; ay Ite et 
may be summarized as under: ares available for analyzing qualitative data 
(a) Word proc s whi 
essors which allow th he ; 
. € researcher to w ‘ell asto edit} 
tor words or word sequences eto wnle as well as to edit text, and to search 


(b) Text retr 
etrieval Programmes th 
Erammes allow res 
Sequences that allow researchers to search and summarize certain word 
= c 





on to varieties of Meaning for 5 Sece 
Sinp 


> ae 


Qualitative Research 587 


(c) Code-and-retrieve programmes for splitting the text into segments and for retrieving OF 
listing all segments of the text 

(d) Text-based managers for administering, searching, sorting and ordering various text 
passages 

(e) Code-based theory building that Supports theory building by supporting steps and 
operations not only at the level of text but also at the conceptual level 

(f) Conceptual Network Builders that help a researcher build and test theory by presenting 
graphic displays or network 

ome of the examples of commonly used programmes are ATLAS.ti, NUD*IST/NIVO and 

a. ATLAS.ti was developed by Muhr (1991, 1994) and is based upon the approach of 

prounded theory and coding according to Strauss (1987). Its latest version is Nvivo7 and is only 

available for PCs. MAXqda was developed by Kuckartz (1995), One can create and import texts 

in Rich Text Format (rt from anywhere on one’s hard disk and from the Internet by dragging and 

dropping: Some objects like PowerPoint slides, Excel tables and Photos may be imported as 

embedded objects of an rtf tile. 

These three programmes are just examples of the wide developing range of progr 

ore information regarding other programmes can be found at www.soc.survey.ac.uk/ 


mMAXaqd 


ammes and 


versions. M 
caqdas. 
COMPARISON OF METHODS OF QUALITATIVE AND QUANTITATIVE DATA 


ANALYSIS 


We have seen that there are different methods and techniques of data analysis in qualitative 
research. If we try to search what is the commonality among these various techniques, We shall 
arrive at the following general steps of data analysis: 

1. The first step in analyzing qualitative research, irrespective of the methods, is organizing 
the data. The qualitative researcher often gets voluminous data from observations, interviews 
and/or documents. The method of organizing these data tends to vary depending upon the 
research strategies and data collection techniques used. For example, interview data are 
generally organized according to individual respondents whereas observation data may be 
organized individually or by grouping similar types of occurrences together. In fact, which 
approach will be taken by the researcher will depend upon the purpose of the research, the 
number of the participants, settings, time under study or on similarities and differences among the 
persons. 

2. The second general step in data analysis in qualitative research is that of description. Here 
the researcher discusses different relevant aspects of the study including the viewpoints of the 
participants, setting both the temporal and physical purpose of any activity of these participant 
and impact of the activities of the participants, etc. . 

3. A final step in data analysis of the qualitative research is that of interpretation. According 
to Patton | |99Q), the interpretation involves explaining the findings by answering why questions 
and putting patterns into an analytic framework. However, the interpretation of qualitative 
research data is much dependent on the researcher's background, skills, biases and knowledge. 

A good qualitative researcher also needs to keep in consideration the issue of internal and 
external validity of the research. Here internal validity is concerned with the accuracy of 
information and the way it matches with reality. Some qualitative researchers try to enhance the 
icra at their ene by the process of triangulation. Likewise, external validity 

ins that the researcher must keep a vigil on the generalizability of the obtained results and it 
possible, replicate his study and its generalizability. | 

- In the previous section, various methods of analysis of qualitative data have been discussed. 

We have also seen the various methods of analysis of quantitative data in different chapters of the 


—, 


gH Test, Meas 


book. Do the two TYPE 
differences betwee 


similarities 


1 


in both qualitative 4 


_ Both types of metho 


als an pepavioural Sciences 
ci : 


“fp Meth 
wrements and Research 


e some similarities? What are 


is hav 
5 of methods of analysis < and then the differences. 


the h... 
a Sie 
first the similaritie 


n them? Let us see 
e method of data analysis, the researche 


to arrive at a conclusion based on reg 
ad data. In fact, both forms of data g 


e and quantitativ 


ical information 
ine empirical in “ 
‘ully examines a, ta abtalel 
Sa mpllhioe the en ial word 
| the : : : | 
anchor statements ci antitative data, the comparison Is a st process. 
ean | internally or with. 
m ethe evidences they have ans aps ee ot 
census re similarities and differences, both type: esearc 
evidences. By (ocr hanisms within the evidence. 


: acces, CAUSES OF mec 
; ty multiple processes, ia iod or proces: ; 
identity P ds of analysis involve a public meth 3) S. Both ty 


data, describe them and document how they are collecteg f 


investigators cae the degree to which method is standardized may bring Some 
F oO : F ‘ r Fy - i | = i ; : 
eran all researchers reveal their research design in some way or the other In 
variations | | 


litative research, the research designs are nol always explicit but they are IMplicit in 

qualitative rese Thad 

reer such research (King el al, 1994) | = 7 | 
litative and quantitative method of data analysis, the researchers avoid fas, 

ees and misleading inferences. They remain alert for possible fallacies, Th, 

saenona explanations and discussions and try to adhere to the more authentic ane 

sc) us . : 


r Very 
*ONling 
Nalysic 


in both qual itativ 


Both type ¢ 
her related 
ers try tg 


valid ones. 


Dissimilarities a 
The following are the major dissimilarities : 


te 


Thus we find that data analysis in 
resemblances and differences that mark 





Quantitative data analysis 1s highly standardized and is built upon applied mathematics 
Quantitative researchers choose from a specialized and standardized set of data analysis 
techniques. On the other hand, qualitative data analysis 's less standa rdized. Qualitative 
research is often inductive and researchers rarely know the details of data analysis whan 
they begin their project. 


. Quantitative researchers manipulate numbers, which represent empirical facts for 


testing a hypothesis with variable constructs. On the other hand, qualitative researchers 
tend to create new concepts and theory by blending together several abstract concepts. 
Qualitative researchers do not test a hypothesis rather they try to illustrate evidence, 
which shows that a theory or interpretation is plausible, 

Quantitative researchers begin data analysis only when they have completed data 
collection and reduced them to some numerical facts. Subsequently, they manipulate 
the numbers for examining pattern or relationship. On the other hand, qualitative 
researchers look for a pattern or relationship during the early stage while they are still 


collecting data. In fact, the results of early data and analysis guide subsequent data 
collection. 


. These methods differ in the degree of distance irom the details of social lite. Quantitative 


researchers assume that social lite can be easily measured by numbers, which can be 
manipulated by statistics to reveal features of social life, On the other hand, the data 
analysis in qualitative research does not depend upon well-established body of 
knowledge from mathematics and statistics. Here data are relatively diffuse and 
context-based and can have, generally, more than one meaning. | 


qualitative and quantitative researches bear some 
their distinct status. 


Qualitative Research 589 


COMBINING eae AND QUANTITATIVE APPROACHES 
yalitative sediigeagrony, research should not be considered as a mutually exclusive 
dichotomy, eee need to include hott i, ality, for answering all of the questions some 
research stucl Te ceeearek Wea ON qualitative and quantitative methods in the same research. 
A cone” re es oe ie both types of research methodologies is survey research. A 
yestiO oie tative data. Example of itative questi ; 
q How would you describe the P quantitative questions are: 


performance oft your local MLA? 5 Fel 3 Z | 
_ _ --¥€rygood = good =~ neutral «= poor ~—_—sverry poor 
Example of qualitative question in the same survey would be: 

How would you assess the overall performance of your local MLA? 


Before paying attention to combining qual 


ame itative and quantitative a | look at 
the central characteristics of the two 4 e approaches, let us | 


approaches, 

Quantitative approach tries to understand reality in terms of variables and relationship 
among ther. It is based on measurement and therefore, prestructured data, conceptual 
frameworks and desi Bn. It has well-developed methods for data analysis. Such ~ethods are based 
upon the principles ” Statistics and mathematics. It places emphasis upon variable-oriented 
analysis. Its methods, in general, are unidimensional and less variable. Samples in quantitative 
research are larger than in qualitative studies, 


Qualitative approach deals more with cases rather than with variables. It aims at an in-depth 
and holistic understanding for doing justice 


a iats BOIS te I : to the Complexity of social life. Such an approach is 
sensitive lo context, to live experiences as well as to local groundedness and a good researcher 
always tries to get closer to what is being studied. In qualitative resea 


ling is guided | rch, samples are usually 
small and its sampling is guided by theoretical 


ae rather than probabilistic considerations. Here the 
researcher rarely prestructures the research design and data. Also, methods of analysis are less 
standardized. They are also more multidimensional, more diverse and less replicable 


Both quantitative and qualitative researches have their strengths. The central characteristics 
of the quantitative research discussed above show that quantitative researches enable 
standardized and objective comparison. Not only this, measurements involved in quantitative 
research permits a description of the situation or behaviour under study in a systematic and 
comparable way. Obviously then for the researcher, it becomes very easy to sketch the contours 
of these situations. The quantitative researcher also brings objectivity into the research. Here the 
researcher can systematically answer the important questions relating to the research and this 
automatically opens the way to the development of some useful knowledge. 


Likewise there are some strengths of qualitative research, too. Qualitative research is more 
flexible than quantitative research and due to this flexibility, such research is well-suited for 
studying naturally occurring real-life situations. Qualitative research is considered the best 
means of studying the lived experience of the people including people’s meanings and purposes. 
In other words, it is considered most appropriate for knowing about local groundness of the 
things as well as what is called actor's definition of the situation. Since qualitative data have the 


sense of holism and richness, they are well-suited for dealing with the complexity of social 
phenomena. 


From the above description about the central characteristics and strengths of quantitative 
and qualitative research, two things are clear. First, the researcher needs to be very much clear 
regarding what he is trying to find out. Second, the researcher can't find everything he wants to 


know by using any one approach and therefore, he has to enhance the scope and power of 
research by combining both the approaches. 








a ‘ id i 
J Fr } { j 4 rariounal oh fd f ah 
% “| werbods ttl / + 
% ‘ Ae cei 1 a 
mate i rm galede = 
il he 


Test, Mee ive approach and qualitative 4Pproac 


= = C 
abining these (we approaches? At the Most be 
HW "I i af Efe, 
bining these two approaches Is to capitalize gl 


500 


l LS l 
[5 ¥ |, 


on F *, 
wiow the sane justificallon for « 
| i 3 is ig 2. : - “om : i , . r : 
bined? What _ eason i a to compensate for the weaknesses or limitations a he 
, ne ns for combining these two approaches shou i 


he practical circumstances and context Of th 
" the 


ican sd -hes as i 
the WWO APP" tayvel, the rease 


ths ol | 
ae the spect!" ght of t 


level, 
ree oaches in li 
. proaches. | =| <jtuation | 


— hese tw S Nin 
fon combining | é iB LW) 
ee is of the two approaches be ee at or Should the data 
Soule Lined? or Should the methods, data and findings all be COMbin 
-he comomnes: 


d methods, data and findings of the (wo approaches, Su: 
tudies (Brewer & Hunter 1989), Such studies have also 
(1986). Besides multi-method studies, sometimes the tee 
different types of combination of quantitative research ang 


research. 
Another Qu 

‘eombining: here. 

these TWO approac 


-njdies, We 
here are studies, pike 
bee are called as multi-metho 


-qvered by Fielding and rieipitie 
cine method is also used to descr 
; itative research. 
Experts have pointe 
questions are important: 
(a) Are the two approaches Dees 
(b) Are the two approaches separate or interac ‘ sssinheastnteseesiios 
fe) What is the logical relation of the two see es? Are ) PGF Kee iy 
| really integrated in multi-methods design‘ are 
4) What are the criteria used for evaluating: the researcht pee . domination é } 
id) wacom! view of validation of are both forms of research evaluated by appropiyac: 


criteriad 
in fact, answering these ques 
for using some sensitive design ol q 
Different experts have written a 


estion rep 


hich have combine 


qual the two approaches are combined, ere 


d out that in considering how 


to be given equal weight! 


questions and considering their implications, permits the researcher 
ualitative and quantitative research in a retlexive way, 
bout combining these two approaches (Miles & Huberman 
1994: Brannen 1992; Brewer & Hunter 1989; Bryman 1992 and Creswell 1994). A mote 
comprehensive treatment has, however, been given by Bryman (199 2) who has provided eleven 
ways of integrating quantitative and qualitative research. Therefore, we shall exclusively 
concentrate here on Bryman’s approach. 

1. Logic of triangulation: The f indings of qualitative research can be checked against the 

findings of quantitative research. The purpose is gen erally to enhance the validity of results, 


Pu 


. Qualitative research can facilitate quantitative research: Cualitative research can 
provide several background information regarding context and subjects, act as a source 
of hypotheses as well as facilitate scale construction. 


3. Quantitative research can support qualitative research: \t means that quantitative 
research may provide help regarding choice of subjects for a qualitative investigation, 

4, Quantitative and qualitative research are combined for providing a more general picture 
of the issue under study: Quantitative research may be undertaken to plug the gaps in 
qualitative research that may arise, for example, due to the fact that the researcher can’t 
make himself available at more than one place at a time. Alternatively, it is just possible 
that not all issues can be addressed solely to a quantitative research or solely to a 
Qualitative research. 


Combining the strengths of qualitative and quantitative research: Structural features of 
sacial life are analyzed with quantitative research and processual aspects are analyzed 
with qualitative approach. These strengths can be brought together in a single study. 


10. 


11. 


" esearchers’ perspective where 


_ problem of generality: The addition 


_ Qualitative research may support the interp 


_ Relationship between micro and macro levels: 


Oualttatiioe kescarch FFI 
archers’ and subjects’ perspeactivec. ) | 
Research } ‘Sisieabbes Quantitative research is usually driven by the 
4S Qualitative research tak > 
. i ry tia ‘weir i YI as 
i of depart 7 eR “nN takes the subjects perspective 
the point of departure. These two emphases may be brought together in a single study 


of some quantitative evidences may help the 


researcher in mitigating the fact that only limited generalization is possible trom 


qualitative research, 


yi ; retation of relationships between variables: 
ne oo the researcher for establishing the relationship among 
variapies ™ omen weak in explaining the relationship among variables. 


Qualitative research can, however, be used for helping i laining the factors 
underlying the established relationship, ee eee | 


ae a When both qualitative and quantitative 
research are used in the same study, the micro-macro gulf is likely to be bridged. 
Quantitative research can often tap large-scale structural features of social life whereas 
qualitative research tends to tap small-scale behavioural aspects. When the researcher 


ae © explore both the levels, integrating quantitative and qualitative research 
becomes a Wise step. 


Stage in the research process: Qualitative and quantitative research may be 
appropriate to different levels of longitudinal research. 
Hybrids: This form of research uses qualitative 


et : research in quasi-experimental design 
(quantitative research design). 


Miles and Huberman (1994) have discussed four general designs which link qualitative and 
quantitative research design. In the first design, there js integrated collection of both types of data. 
In the second design, a multiwave survey is done in parallel with continuous fieldwork. In the 
third design, an exploratory fieldwork leading to quantitative instruments is conducted and 
quantitative data collection and analysis is followed by further qualitative work. In the fourth 
design also there occurs alternation where a survey is followed by in-depth qualitative work, 
which then leads to an experiment to test some hypotheses. Creswell (1994) has pointed out 
three designs which govern linkage of qualitative and quantitative research. The first design is 
called as two-phase design where quantitative and qualitative phases are separated irom each 
other. The second is dominant/less dominant design where the researcher conducts a study 
within a single dominant paradigm with one small component drawn from the alternative 


paradigm. The third is the mixed-methodology design where some aspects of the two approaches 
are mixed at all the stages of the research, 


), 


vl Review Questions 
Give the meaning ancl essential features of qualitative research, 
Explain Maxwell's model of qualitative research. 
Trace the history of qualitative research 
Discuss the major themes relating to design strategies of qualitative research. 
Citing examples, outline the major theoretical perspectives of qualitative research. 
Citing relevant examples, point out the major research design strategies of qualitative 
research, 
Critically examine the different sampling techniques of qualitative research. 
Discuss the major data collection techniques of qualitative research. 


How does a qualitative researcher analyze and interpret data obtained from the 
research? 





ements and research Methods im Behavioural sciences 


592 Test. Meas 


10. Give an oudine of the important strategies for combining quantitative and ualitatn, 
research, 7 = : 
11. Make a comparative study of method of analysis of qualitative research and quantitatiy 
ail : | :; 
research. 
12, Write short notes on the following: 
(a) Discourse analysis 
(b) Conversation analysis 
(c) Case studies 
(d) Document review 


(e) Grounded theory analysis 





23 | 
CARRYING OUT STATISTICAL ANALYSES 


a ease ania aneienee 


| CHAPTER PR 
sample and Population PREVIEW 


Estimation, Confidence Intervals and Confidence Limits 
Normal Curve 
Area under the Normal Curve 
Non-normal Distribution: Skewness and Kurtosis 
Measures of Relative Position: Standard Scores 
Linear Standard Scores 
Normalized Standard Scores 
parametric and Nonparametric Statistical Tests 
Parametric Statistics 
Student's f and # test 
F Ratio 
Pearson r 
Nonparametric Statistics 
Chi-square Test 
Mann—Whitney UL’ Test 
Rank-difference Methods 
Coefficient of Concordance 
Median Test 
Kruskal-Wallis A Test 
Friedman Test 
Correlation and Regression 


¢ Major Terms and I[ssues in Correlation and Regression 
e Choosing Appropriate Statistical Tests 


SAMPLE AND POPULATION 


In carrying out a statistical analysis it is important to 
terms, namely, population and sample. A popu 
characteristic for any specified group of individuals or objects. Thus, we may speak of a 
population of arts graduates, 
a population of medical books int 
may be defined as any selected number from a population. 
according to some rule or plan. By studying the sample, 
population. Ameasure based upon a sample is known as a 
population and inferred from a statistic 1s known as a parame 
as a mathematical measure (based upon 4 
analyzing and interpreting the obtained data. Bas 
be of two general types, namely, a probability sample a 
based upon a method in which chance plays a role a 


distinguish between the two commonly used 
lation may be defined as the totality of a particular 


a population of science graduates, a population of science teachers, 
he library, a population of university employees, etc. A sample 
Usually, this selection is done 
some inferences are made about the 
statistic and a measure based upon the 
ter. Astatistic, then, may be detined 
sample) which helps in gathering, organizing, 
ed upon the method of selection, a sample may 
nd anonprobability sample. The former is 
nd the latter involves a method in which 


595 


> 


= . F | yral Sciences 


ement and the probability of the individual being 


cult ye judg qT 
lected according to son es of probability and nonprobability samples has 


samples are 5e “a rove 
ace nown. A detailed dis 


elected nade in Chaplet 14 
, ser made Wy Chappe ; . 5S aos a es 
sins ‘ misconception among, students that a sa mple is a carbon copy of or h as the 
It is a popular mr | 


ee ulation, In fact, when several samples ve taken from the Same 

identical properties of a a each other, But the nature of their variations, as pointed by the 
population, they wine! Me eaeapaly predictable. Hence, a researcher, despite these Variations 
Central Limit Theorems : Heit a population on the basis of studying a sample. The Central Limit 
can make ane nature of sample means and helps the researcher in makin a 
Theorem va st the opulation with some known probabilities of error. The theorem States 
a cee of squalsizel samples (usually greater than 30 in size) is randomly taken 
ine = copulation the following three P peCACHONS will be - 4 

1. The means of the sample will be distributed none y. | 

2. The means of sample means will be identical with the mean of the population, 


3 The distribution of sample means will have its own standard deviation which jg 
technically known as the standard error of mean. | 

it can, therefore, be said that statistics are the most important tools in the hand of researchers, 
which enable them to make inferences or generalizations about a population with known 
possibilities of errors on the basis of observation ot the characteristics of a population, Such 
‘nferences are known as statistical inferences. In any statistical analysis if we are deal ing with the 
entire population, we use descriptive statistics. On the other hand, if we deal with a sample, Our 
analysis relates not only to the characteristics of the sample but it also provides information about 
the population. Such statistics are known as inferential statistics or inductive statistics or 
sampling statistics. 

Students should also know about the meaning of hidden population. By hidden population, 
is meant a population of people who engage in clandestine, socially disapproved or concealed 
activities and who are really very difficult to be identified. To locate hidden population is 
considered very important in studies of deviant and stigmatized behaviour. For example, AIDS 
researchers have to draw samples from hidden populations, Many researchers often combine 
probability and nonprobability sampling for special situations such as for studying hidden 
population. 


NORMAL CURVE 


The concept of normal distribution is of utmost importance in statistical theory and practice, and 
no one is expected to interpret a statistic successfully without some understanding of the concept 
of normal distribution. 


In the 18th century gamblers were interested in finding out the chances of beating an 
opponent at different gambling games and for this, they approached mathematicians. De Moivre 
in 1773 was the first such mathematician who developed a mathematical equation of the normal 
curve. Later on, Gauss and Laplace in the early 19th century rediscovered the normal curve 
independent of De Moivre’s work. Gauss was primarily interested in the problems of astronomy, 
which led to the consideration of a theory of errors of observation, Hence, in that century the 
normal curve came to be known as the ‘normal law of error’. In the middle of the 19th century, 
Quetelet promoted the applicability of the normal curve by popularizing the view that most of 
the problems in the field of anthropology, sociology, meteorology and human affairs could be 

solved with the help of the normal curve, In the latter half of the 19th century, Galton 
systematically undertook the study of individual differences and during this systematic study he 
found that most of the physical and psychological traits of human bei a ner canably 
to the normal curve. In this way, he extended the aS uman beings conformed reasonabl\ 

' ed the applicability of the normal curve. Today the 


re 


Carrying Out Statistical Analyses §95 
smal curve is known by various names such as the 
i. yi-shaped curve oF the curve of error. 


A normal curve Is one which graphically represents no 
ormal distribution is one in which the Majority « 
on only a small number of cases are located at: 
vpriables in psychology, sociology and edu 
o distribution ever takes the absolute form of 
yre close 10 this absolute form and we assume that they are 
The major characteristics of a normal curve are enlisted below 

1. Anormal curve is always symmetrical, 

the left half of the curve. 


Gaussian. 
sian Curve, De Moivre’s Curve, the 


al distribution, By definition, a 
located in the middle of the scale 


that is, the right half of the curve is equivalent to 


2A normal sash dn eto and the mode is always at the centre of the distribution. In 
fact, in d fe the mean, the median and the mode are numerically identical 
and fall at the centre of the distribution. 


3, Anormal curve is asymptotic to the x-axis. 


Hence, a norme , 
; normal cu | 
baseline, no matter how far the cury rve never touches the 


€ is stretched. 
4, In anormal curve, the highest ordinate is at the centre. All ordi 
spices 7 . All ordinates on 7 
distribution are smaller than the highest ordinate. silat 


5. Anormal curve is continuous. 


Area under the Normal Curve 


As we know, there is a definite relationship between standard deviation units and 
curve. Figure 23.1 shows the different percentages of areas falling under a normal curve at 
different standard-deviation units. One standard-deviation unit taken on each side of the mean 
includes a total area of 34.13% + 34.13% =68.26% of the curve. This is approximately two third 
of the cases. In terms of probability it can be said that in any normally distributed sample 
chances are that two out of three scores will fall within the area of one standard-deviation unit nae 
each side of the mean. A second standard-deviation unit taken beyond the first standard 
deviation cuts off 13.59% of the area on each side of the mean (Figure 23.1). Thus, up to two 
standard-deviation units on each side of the mean we have 68.26% + 27.18% =95.44% of the 


total area. If we take another, or a third standard deviation, we have 2.15% of area on each side of 
the mean. The total 


the normal 





if ay 0, ‘, 
fmm \ 
f \, 
215% “ | 
—1 43.59% | | 





-36 -26 -io O +160 2G 0 +30 


Fig. 23.1 Different percentages of area under the normal curve at 
various standard-deviation units 


areas included by all the six standard-deviation units (three standard-deviation units on each 
side), have accounted for 99.74% area of the normal curve. This means that only 0.26% of the 
Cases are left which lie beyond three standard-deviation units from the mean. For conven lence in 
statistical computation, we generally take and analyze cases up to three standard-deviation units 
on each side of the mean in normal distribution. 


> 


wy... % 


596 Tests, Me 


In psychological 
applications giv 
1. Anorma , 

9, Anormal curve helps ! 


3. If we want to N 


4. 


5. The data obtain 


Researchers in the field of psychol 
scores, even though many other distribut 
reasons for it. Some of the impo | 

(a) One popular reason why researchers 


(b 


a 


oural Sciences 


rep Methods i Behati 


; Reseda 
eseremens and 


ad educational researches, the normal curve has the main Practica, 
a 


en below. | 7 

e helps in transforming the raw 3 
| curv calculating the percentile rank of the given scores, 
n 


lize the obtained distribution, a normal curye 


res into standard scores. 


orma IS of 


: importance. | 
immense imp helps in testing the significance of the obtained measures a 
: non a tc and thi enables the researcher to make a generalization a 
chance hypo vit: 

nopulation from which the sample was dra | - 

pop ed on the basis of the responses fo attitude scales, ratings or ranki 
: qualitative data by making suitable transformatio 


Bainst a 
bout the 


NES ma 
be scaled in terms of the Nin the 


ical values. istributi 
numeric ogy and education prefer a normal distribution 


ions are theoretically possible. Why? There are « 
rtant reasons are as under: 

prefer normal distributions is that normal curve has 
some useful mathematical features that form the basis for several kinds Of statistical 
investigation. Let us take an example to illustrate this fact. epee researcher wante 
to know whether the average (mean) intell Ipence of first iene ee ical students and first 
year engineering students of an institution were significantly different. An inferential 
statistic such as the ¢ test for the difference between means would be the most 
appropriate statistic. However, this inferential statistic (many Miers alsa) is based upon 
the assumption that the underlying population of scores is normally distributed OF nearly 
so. Thus for facilitating the safe and smooth use of inferential statistic, researchers Prefer 
that the test scores in the population follow a normal or near-normal distribution. 


Of tes 
Vera 


Another reason for preferring the normal distribution is its mathematical precision. Since 
the normal distribution is accurately defined in mathematical terms, it is possible to 
compute area under different regions of the curve with accuracy. For example, from 
normal curve we can easily determine that vast bulk of scores (more than 68 per cent) fall 
within one standard deviation of mean in positive and negative direction, 


(c) The third reason for preferring a normal distribution of test scores is that normal 


distribution of test scores often arises spontaneously in nature. Due to this reason, early 
investigators considered normal curve as law of nature. In fact, important human 
characteristics, both physical and mental, tend to produce a close approximation to the 
normal curve especially when measurements for large and heterogeneous samples are 
graphed. Physical characteristics like birth weight, brain weight and mental 
characteristics like intelligence do have a near-normal distribution. 


Nonnormal distributions: Skewness and Kurtosis 


Nonnormal distributions are those distributions of score, which deviate from normal one. 
Nennormality of distribution arises when some of the hypothetical factors determining the 
strength of a trait are dominant or prepotent over the others and therefore, are present more often 


than chance will allow. Two important types of nonnormal distributions frequently studied by 
psychologists and educational researchers are: Skewness and Kurtosis. 


Skewness refers to the extent to which 
shape. If the 


gradually toward high or right end of the distributi 
other hand, when 5 i) istribution, 


a frequency distribution departs from a symmetrical 
test scores are piled up at the low (or left) end of the scale and are spread out 
it is said to be positively skewed. On the 


the test scores are plied up at the high or right end of the scale and are spread 


Carrying Out Statistical Analyses 597 


gradually toward low or left end of the distribution, jt 
e F 

ures ; ; 
ee the skewness is negative, the mean |i 
Ww 


23.2, 23-3). When the skewness is Positive, the ale tae negatively skewed (cf. 


n lies to the right of the median and 
© median. 


es to the left of th 


Frequency of score 


Frequency of score 


Low Score High : —— 
opativ Ness Score 'g 
Fig. 23.2 Negative skewness 
= Fig. 23.3 Positive skewness 





In normal apie sicblgss i the median exactly and therefore, skewness is zero. But in 
,onnormal distributions these two differ and accordingly, skewness is enhanced 
There are several formulas for calculating skewness 


wness can be calculated as under: 
Foo + Ao 
S; = - 


In terms of percentiles and median, 
ske 


—Mdn 


D (23.1) 
where, S, = skewness; Mdn = Median: D=d 





ifference between the values of Py, and Po. 
Suppose for a distribution of intelligence scores, we get the following statistics: 


Mdn=80, PRo=20, P,.=150, N=100 


150+20 
“80 _85-80 5 


5. = Se ee ee 
then, k 130 130 


“750-20 — 


In order to answer whether this distribution is skewed significantly or whether it may well be 
merely chance fluctuation from skewness of zero, we need to calculate its standard error. In this 
situation, the standard error of skewness is calculated by formula 23.2 


5, = 05185 
it (23.2) 


where, S, =Standard error of skewness. 


By substituting, the value of skewness in formula 23.2, we get 


0.5185 0.51 
54 = eeide a =0051 
v¥100 10 


sk 
as the value of z. Since this is smaller than 1.96, it is obviously not significant. Therefore, the 
obtained skewness 0.038 is not significantly different from zero. So, we conclude that there is no 
evidence to indicate that intelligence scores are not distributed symmetrically. This conclusion 
should be noted carefully and the erroneous inference that we have thereby proved that the 
scores are distributed symmetrically should be avoided. The fact to be noted is that the skewness 
is not significantly different from zero does not justify in concluding that skewness is zero. 


Dividing the skewness, that is, 0.038 by its standard error, that is, 0.051, [Se »we get 0.745 





/ L ! Fok] 
a t - i 


sd FL h 


nis a nal & 


is i en 


soa Tests Mes ean, median and standard deviation as Under. 


in -ofm 
pe calculated in terms O 


Skewness can al 3(mean —median) (23 3) 
S, = Gandard deviation 
of, 
mean-mode (23.4) 
Ss; = standard deviation 
, — rs ‘a by 
here, is given DY 
Its standard error, 3 (23.5) 
Ss a \ IN 


C } J 


geveral answers to this — <kewness and its standard error. If the skewness jc not 
li) Re erent ira zero, he simply ignores It. | | | 
is: sea” -kewness and its standard error, if the researcher finds that it is 
lit) After compul ei ni zero, he thinks about it, looks at the distribution again and 
significantly oe aigibution i¢ not very skewed but says or does nothing beyond 
en and its standard error. This looks like a very casual procedure. 
reporting je 


T cher tends to alter the shape of the distribution, forcing it to become normal 
(iii) he ancenelint S, =0.00. This process is called as normalization and has been widely 
an Ing 
in de ining T scores. 
(iv) ieee ina transform the skewed distribution of scores By Using some function 
| mE which will produce a distribution with little or ict ebaasem rorzanens if the 
eneneriie of X scores is positively skewed, such functions 8 = Vx (square root 
transformation such as X = 2 becomes v2 = 1.41; X =6 becomes 3 6 = 2.45, and so on) 
will increase the spacing among the low scores and may reduce spacing among high 
scores. Thereby the value of skewness will be considerably reduced. 


What are those situations that lead to skewness! Researchers have identified five such 
situations that generally produce skewness. 


(i) There may be a natural restriction at the low end of scale. This leads to positive 
skewness. The examples are number of offspring, Commission earned by life insurance 
agents, minimum speed of the car at stop signals, etc. 

(ii) There may also be a natural restriction at high end of the scale resulting in negative 

skewness. Example is the number of rats that can be kept in a box. 

lili) There may also be artificial restriction at the low end of the scale and this may produce 

positive skewness, A good example is the distribution of scores earned by the examinees 

onan extremely difficult test. 

liv) There may also be artificial restriction at the high end of the scale resulting in negative 


skewness. Example is the distribution of scores earned by the examinees on an extremely 
easy lest. 


(w) Skewness may be found as a function of the measuring system. One example is a record 
of time required to complete a task when the researcher is really interested in speed, 


How is skewness measure interpreted? When computed by formu! 


de a 23.1, the maximum 
value of skewness can be +0.05 and the minimum value 


. of skewness can be —0.50. Symmetry or 
lack of skewness is denoted by 0.00. If skewness is positive and =e 


or Z 15 + 1.96 or preater, we 
she 


, 60UC«~sSs 





Carrying Cut Statistical Anal §=6§99 


bvious reason to conclude that the lopsidedness in our 
adental and that the trait we are measuring is really positive 
cci ae <ome other group similar to the one used in the 
oO selec « from mean or median wi 


sample of cases 15 not just 
y skewed. It means that if we were 
; research, we would again find that 
deviation Ml £O In the Eeslve direction rather than in the negative 

ction. However, if skewness is negative and if —* is ne 


o 


dire gative but somehow equal to or greater 
sk 


than -! .96, we may likewise conclude that trait is really negatively skewed. On the other hand 

ai 

whenever = lies between -1.95 and +1.95, it means that we have failed to prove that the 
aki 


ckewness exists. Here two ol are, however, to be clearly kept in mind: (a) this holds regardless 
of the value of skewness and (b) this never means we have established that the curve is 
symmetrical | . . | 

in psychological and educational testing, skewed distributions usually signify that the test 
constructor has included too few easy items or too few hard items. For example, when scores are 
concentrated at the low end (positive skewness), it means that the test contains too few easy items 
for making effective discriminations at the end of the scale. In such a case, the examinees who 
have obtained zero or near-zero score might actually differ with respect to the dimension 
measured. But the test is not able to find out these differences because most of the items are too 
hard for the examinees. On the other hand, if scores are massed at the high end (negative 


skewness), It means that the test contains too few hard items for making effective discrimination 
at this end of the scale. 


The most direct solution to such skewness is to add items or modify the existing items so that 
the test has more easy items (for reducing positive skewness) or more hard items (for reducing 
negative skewness). If it is too late to revise the test items, the test constructor can use a statistical 
transformation to help produce a more normal distribution of scores. However, the most 


preferred good strategy is to revise the test so that the skewness is kept at the minimum or it 
becomes nonexistent. 


Kurtosis refers to the peakedness or flatness of a frequency distribution as compared with 
normal distribution. In simple words, kurtosis is the extent to which a curve is more or less 
peaked than a normal curve with the same standard deviation. When a frequency is more peaked 
than the normal one, itis said to be leptokurtic; when flatter than normal, it is called as platykurtic 


and when the frequency distribution is neither peaked nor flat-topped but as the same as the 
normal curve, this medium kurtosis is described as mesokurtic (see Figure 23.4). 


Al 
ic i 





Fig.: 23.4 Types of kurtosis: curve A is leptokurtic; curve Bis mesokurtic; curve C is plarykurtic 


In Figure 23.4 curve A is leptokurtic, curve B is mesokurtic and curve C is platykurtic. 


Measures of kurtosis are not widely used. Hence a simple formula for calculating kurtosis is in 
terms of percentiles given on the next page. 


Methods 1" Behavioural Sciences 


search 
600 Tests, Measurements and Resear 


Q (23 6) 


b= 


™ Pa — Fro 


where, K, =Kurtosis; 


Q = Quartile deviation be between 0.00 and 0.50 although most values will 


ays ; 7 
The numerical value of Ku must alwa) ormal distribution, formula 23.6 gives Ku = 0263. 5 


ithin the limits of d 0.31. Forn tridutior Ku =02 
Sa ee he cna iS platykurtic and i less vine nti he distribut 
eptolantic Kurtosis can also be calculated by another formula (Downie & hea 0). 


MEASURES OF RELATIVE POSITION: STANDARD SCORES | 

means its distance from the mean, expressed IN terms Of some 
dard deviation. A standard score, which is 2 kind of deriveq 

score, is a method of expressing the distance of the score from the mean in terms of standar ‘ 

deviation. The ‘standard’ about a standard score is that It has a fixed mean and a fixed standarg 

deviation. Standard scores can be classified into two most common Categories. 


The relative position of a score 
deviational measures such as stan 


Linear Standard Scores 

The underlying purpose of transforming any original scores Into standard scores is to make the 
scores on different tests comparable. There may be linear or nonlinear transformation of the 
original scores. A linear standard score is one where linear transformation of original scores is 
made. When standard scores are based upon linear transformation, they retain all the 
characteristics of original raw scores because they are computed by subtracting a constant isuch 
as mean) from each raw score and then dividing the obtained value by another constant (such as 
SD). Since all characteristics of original raw scores are duplicated in linear standard scores, any 
statistical computation that can be done with original raw scores can also be done with such 
standard scores. The most common examples of linear standard scores are the sigma scores (or z 
scores}, Army General Classification Test (AGCT) scores, College Entrance Examination Board 
(CEEB) scores and Wechsler Intelligence Scale DIQs. These linearly derived standard scores can 
be compared among themselves only when they are obtained from distributions which have 
approximately similar shapes. 


| A sigma score is one which expresses how many standard-deviation units a particular score 
talls above or below the mean. To compute this, we subtract 
and then, divide the result by the standard deviation. A detailed discussion of the sigma score has 
already been done in Chapter 7. As discussed in that chapter, a 2 score or sigma score has two 
important limitations, namely, occurrence of negative " 

fractions, which make a sigma score difficult for use in further statistical calculation as well as in 
reporting. To get rid of these difficulties, further linear transform io) , 408 
AGCT scores, CEEB scores and DIQs in Wec 
transformation. The AGCT scores employ ar 
employ a mean of 500 and standard dey 
standard deviation of 15 for further line 
sigma score into any of the above st 
by the desired SD and 


the mean from each original score 


values and occurrence of decimal 


ation of sigma scores is made. 
hsler intelligence Scales are examples of such linear 
YY amean of 100 and standard deviation 20; CEEB scores 
lation oF 100; and WISDIQs employ a mean of 100 anda 
arly transforming the sigma scores. To convert the original 
add or ieee, hae : “e simply need to multiply the standard score 
| esired mean value, 

Normalized Standard Scores 

sometimes research 
dissimilar shapes, | 


es May Wish to compare the 
N such sity 


scores obtained from distributions having 


ations, they ery peas . 
V employ some Nonlinear transformations so that the 





ON Is 


Carrying Out Statistical Analyses 601 


' veniently fit into a i 
cores may CO™ NY specified BF ah 
: jon. The reason for choosing 4 wal f distribution, preferably the normal 


distribut! distribution is twofold. Fi ) 
eT , al . First, most of the 
characteristics encountered in behavioural researches are normally distributed among the 


opulation and second, a normal distribution facilitates further statistical computation. Thus, the 
normalized standard scores may be defined as those standard scores which on Lereanened 
in the form ofa incon te that has been transformed in a way that fits a normal curve Like linear 
standard SCOFES, normalized standard scores can be expressed in terms of the mean of zero and 
standard deviation of 1. Ifa person has obtained zero as a normalized standard score it indicates 
that the performance of that person lies exactly at the mean and hence, he excels aver 50% of the 


rsons in his er Linewlse; a person’s performance is at +2SD units above the mean or 
normal curve, it means he surpasses about 98% 


o Of the per in his er iaica. TE Hi 

rormance falls at-2SD units below the mean, it means ‘: creo i an ana “ani 
in his group. The common examples of normalized standard scores are the Tscore and Stanine 
score, which have already been discussed in detait in Chapter 7. The reader should note carefully 
that if the distribution of original scores is a normal one, the linear standard scores and the 
normalized standard scores would yield more or less identical results. A normalized standard 
score is preferred only if the situation meets the following requirements: 

1. The sample is large, 


2. The sample is representative. 


3. The non-normality of the distribution of original scores is not due to the behaviour or 
trait under consideration, rather it is due to some defects in the test material itself. 


PARAMETRIC AND NONPARAMETRIC STATISTICAL TESTS 


The parametric and nonparametric statistical tests are commonly employed in behavioural 
researches. A parametric statistical test is one which specifies certain conditions about the 
parameter of the population from which a sample is taken, Such statistical tests are considered to 
be more powerful than nonparametric statistical tests and should be used if their basic 
requirements Or assumptions are met. These assumptions are based upon the nature of the 
population distribution as well as upon the type of measurement scales used in quantifying the 
data. The assumptions may be enumerated as follows: 

1, The observations must be independent. In other words, the selection of one case must 

not be dependent upon the selection of any other case, 


. The observations must be drawn from a normally distributed population. 


) oP 


3. The samples drawn from a population must have equal variances and this condition is 
more important if the size of the sample is particularly small. When the different samples 
taken from the same population have equal or nearly equal variances, this condition is 
known as homogeneity of variance. Statistically speaking, by homogeneity of variance is 
meant that there should not be a significant difference among the variances of 
different samples. 

4. The variables must be expressed in interval or ratio scales. Nominal measures (that is, 
irequency counts) and ordinal measures (that is, rankings) do not qualify for a parametric 
statistical test. 

5. The variable under study should be continuous. 

The examples of a parametric test are the 2 test, ftest and F test. 

A nonparametric statistical test is one which does not specify any conditions about the 
Parameter of the population from which the sample is drawn. Since these statistical tests do not 
make any specified and precise assumption about the form of the distribution of the population, 
these are also known as distribution-free statistics. The nonparametric statistics do not specify any 


( 2 fe ‘ i Weashire ? a an ead ref 1 H i ere, 
i i fel ee Ba fa uk, fer 


gh certain assUMptions are assoc 
‘tions like parametric statistical tests pie under study should be continu 
2 enamel tric statistical test, the varia ions are neither rigid nor so o| 
them. For a nonparamenn's ° dent, But these assumptlo neil 7 
observations should be independent | test, The examples of nonparametric te 


Sara fa parametric statistica fe seepebtieteen set 
patie mn ~ ; whiney U test, Kendall’s tau, he cllondicumen mee 
= i. test, {I e ann- ! nl int e oO ‘ 
: A reaapavariente statistical test should be used only 


The shape of the distribution of the population from which a sample jg drawn IS Moy 
I]. The shap : 


known to be a normal one. ; 
: —s 
2, The variables have been quantified 0 


—_ bles have been quantified on the basis of ordinal measures (or ranking) 
4, }he varia lave be 


Because nonparametric statistical tests are Te he peeling mel "BS rather 
than on the measured values, they te er see bP pit ee * x anh "yPothesi 
I _ sea eit dry however, el rae es | 
tests are more powerful and have more merits than parametric eel ee ues 'elr Validity jg not 
based upon the assumptions about the population ic ine me = argue that the 
parametric assumptions are often ignored by the researchers and t tore’ ine evidences in certain 
parametric statistical tests like the t test and F test thal violation ol assump! ONS, particularly when 
a sample is large, does not affect the power of the statistical tests (Edward 1968; Winer 1971). Nop 
only this, for some population distributions nonparametric statistical tests are SUPETIOr in power 
lo parametric statistical tests (Whitney 1948). | 

Table 23.1 presents a summary of the levels of quantitative description and types of 
statistical analysis appropriate for each level of measurement. 


lated With 
NS and the 
ab orate as 
IS are the 


“Oldanc, 


the basis of nominal measures (o, frequen 
ty 


Table 23.1: Levels of quantitative description of parametric and nonparametric data 


] 


Nominal scale {Classified and 
|counted 











Mode, chi-square test and SIEN test 





Rank in order Nonparametric = |Median. Q, Stanine, Spearman's 


Irho, Kendall tau, Mann-Whitney 
U test, Wilcoxos Sign Rank test, 

Kendall's partial rank order 

correlation 












Equal Intervals, |Parametric 
No true zero 
|point 
Equal intervals, 
True zero, Ratio 
relationship 


Mean, Standard deviation, 
Pearson r, ttest, ANC WA, 


ANCOVA, Factor analysis 


> ee scale 
1 Ratio scale 


Students should keep in 
honparametric tests. 





mind the 


fundamental differences betw 


een parametric and 


ptions about population parametric whereas 


* In parametric test, the measure of centra 
the measure of centr 


| tendency is | 
: Is Mean Whereas 
al tendency is med; 


in Nonparametric test 
an, 





Carying Out Statistical Analyses 603 
e In parametric test, there js Complete infor 


nonparametric test, there ; Mation about the population whereas in the 


| about the population. 
Pplicable to Varia 


apelicabli boats arlebe ca arte bles Only whereas the nonparametric tests are 
arametric test. ¢ ; 
ein ony aa ee eee Of variables is done on interval or ratio level whereas 
innonp , € Variables are Measured on nominal or ordinal level, 
e In parametric test, the test statistic ic 


based upon | bo8 : . | 4 
test, the test statistic is arbitrary, Pon the distribution whereas in nonparametric 


Siig aia nike: Clear that Many parametric statistics such as t test, ANOVA, 
wn Safi violated This leas Fb eopriate even when the assumption of normality 
Oe wmwrak > Nas been demonstrated especia NOVS | 
ANCOVA by researchers like Mar vine ed especially for t test, ANOVA, and 


Glass, Packham and Sanders (1972). Not 
only this, these statistical procedures Which are suited for interval data and ratio data, can be 


applied to ordinal data as well as to dichotomous data (such as pass-fail, male-female, etc.) (Best 
& Khan 2006). 

| Bradley 1968) has enumerated several advantages and disadvantages of nonparametric 
statistical tests In Comparison to parametric Statistical tests. Some of his important advantages are 
as follows: 


1. Simplicity and facilitation in ¢ 


, erivation: Most of th 
derived by using simple computational formul 


parametric statistics, the derivation of which 

2. Wider scope of application: S' 
statistics are based upon fewer and les 
population distribution 


€ nonparametric statistics can be 
as. This advantage does not lie with most of the 
requires an advanced knowledge of mathematics. 


NCe Nonparametric statistics as compared to parametric 
5 rigid and elaborate assumptions regarding the form of 
they can be easily applied to much wider situations, 
3. Speed of application: When the sam 
statistics is faster than parametric statistics. 

4. Susceptibility to violation of assumptions: In case of nonparametric statistics, 
assumptions are fewer and less elaborate than in the case of parametric statistics. Therefore, 


assumptions of nonparametric statistics are less Susceptible to violation. Not only this, these 


violations are easier to check and can be readily and economically taken care of with the 
nonparametric statistics, 


ple size is small, calculation of nonparametric 


5. Type of measurement required: Non 
upon a nominal scale and ordinal scale, 
based upon the interval scale and/or ratio 
scale or ordinal scal 
the 


Parametric statistics require measurement based 
whereas parametric statistics require measurement 
scale. As treatments associated with either nominal 
€ are easier than treatments associated with either interval scale or ratio scale, 
PeraMelric statistics have a better case for applicability than the nonparametric statistics. 

6. Impact of sample size. When sample size is 10 or less than 10, nonparametric statistics 
are easier, quicker and more efficient than the parametric statistics. If the assumptions of 
Parametric statistics are violated for such small cases, the result is likely to get badly affected, 
Therefore, for this sample size, nonparametric statistics are always superior to the parametric 
Statistics. The reader should note that as the sample size increases, nonparametric statistics 
become time-consuming, labour-intensive and less efficient than the parametric statistics. 


’. Statistical efficiency: Nonparametric tests are often more convenient than the parametric 
tests. It the data is such that it meets all assumptions of nonparametric statistics but not of 
Parametic statistics then nonparametric statistics have statistical efficiency equal to parametric 
Statistics, If both Parametric and nonparametric statistics are applied to the data wh ich tultils all 
assumptions of parametric tests, the distribution-free statistics become more efficient with a small 
Sample size but they become less and less efficient as the sample size increases. 


Hn pebariouTa! Sciences 
| pint towards disadvantages of parametric 


ch, in genera ., ones are as given below, 
advantage (which, es them. The main one g 


vy! er statistical efficier 
Dee me aie vantage are associa metric 5 atistics have to IClency 
statistics), certain cise" 45a), the nonparan” _oferably above 30. 


Mose ) | | 
l. According ' H when 5am le size | iil alled, Siegel (1 a8) and See 
| . istic 
than paramelric statis 


| ly iwasteful of data’, 
2. Mf all assumptions of : eae 

Castellan (1988) consider the Us bability 

3, It is also caid that the Pr ferent 

statistics are widely scattered in -crice are available, the follow, 

dificult to Vocate and interpret metric statistics are , the following 


-- haticlt non : 
Where both parametric statistics and (the two: 


‘no any one O | 
view for selecting ANY | ctics 3 be used beca | 
guidelines wee ae n permits paramelric statistics should b use the 
. ijtua te 4 ' 
(i) If possible and the s! 


the cases. | _ 
: res tha 
(ii) The central limit theorem clearly ENS¥ 


even for non-normal distributions. 
norma 
(iii) If the data are available _ _ Hi . 
i ist scan 
that parametric statistics te wna a 


_ arametric tests ; 
(iv) For larger samples (N 7 40), p ernative to using a nonp arametric statistic 


‘a6 | mall, there is nO 4 oa 
f the sample size is very SMe" a ; ’ er 
- lt : ‘ Ascusé soe important parametric and nonparametric statistics Commonly 


used in behavioural researches. 


1 | - 
metric stale" statistics as sIMp 


ric ee ! 
> for testing the significance Of nonparametric 
tables | 


plications which, for 4 behavioural scientist, jg 
pu We 


parametric statistics will prove beneficial 


| distribution, the same can be transformed so 
4 (Hollander et al. 2014) 


PARAMETRIC STATISTICS 
The most important parametric statistics are 
1. Student's ttest and z test 
2, Analysis of variance: F ratio 
4, Analysis of covariance 
4. Pearson r 
5. Partial correlation and Multiple correlation 
A detailed discussion of each of them is presented below. 


1. Student's {test and z test 


When the researcher wants to test the significance of difference between two means, he uses 
either the test (or t ratio) or z test (or z ratio). The computation of for Z involves the computation 
of a ratio between the experimental variances (that is, the obtained difference between two 
means) and the error variance (that is, standard error of the mean difference). However, there are 
two basic differences between t ratio and z ratio. When the sample size is less than 30, we use the 
ttest or Student's t for testing the significance of the difference between two means. This concept 
of small sample size test was developed in 1915 by William Seely Gosset, a statistician for 
Guinness Breweries in Dublin, Ireland. Because the service code prohibited publication under a 
researcher's name, he signed the name ‘student’ for publication of this test. Hence, this statistic is 
nemo coeelitueeatesant art is more than 30, the ratio of the difference 
which is interpreted throu sh the use A i fee é ‘i en r an ee 
ol aa: seulasar een norma probability tables. Another difference is that ztest 

| | eviation whereas t test uses the sample dard deviation 

as an estimate when actual populati 7 piste e sample standar 
population standard deviation is not known. The reader should note 





Carrying Out Statistical Analyses 605 

syat 35 the sample size increases, the Critical valu 
‘dually reduce and approach the z values of th 
gumptions of ttest. First, the variances of the 
homogeneity in variances of the samples. Sec 
condition helps to ensure that the samp 
eneralizations can be done from samp! 


€5 of f necessary for rejecting the null hypothesis 
- We: rmal probability table. There are some basic 
ete are nondifferent, that is, there exists 
i sitio has been randomly selected. This 
: e to populati ative of the population so that accurate 
scores |5 normal. In fact, this requirement stems fom i Third, the population distribution of 
When it Is stated that the observations are indepen fan nine, of normal pala 
-ample variances can be independent on random, the sample means an 


ly when the ‘an | 
BR cs cat | Population is normally distributed. t ratio as 
the test of significance of means demand that the means and variances a be independent at 


= caver bar la ak a some systematic way—that is, one (mean) becoming 
larger “ ezuilaiion ia oming larger. If these tend to vary in a systematic way, the 
underlying POP € normal and test of significance based on the assumption of 


normality would be treated as invalid. It is for this reason the | ity li | 
ine use of ttest. assumption of normality lies behind 


Student's {test is reterred to as a robust test, which means that statistical inferences are likely 
to be valid even when there are large departures from normality in population. The ttest is likel 
tn ie robust to the violations of normality when large samples (N > 30) are used. Therefore i 
case of any Serious doubts concerning the normality to population distribution it is wise to 
increase Nin each sample (Elifson 1990). Before discussing the smal sample f test five important 
concepts. namely, degree of lreedom, null hypothesis, level of significance one-tailed test vs 
two-tailed test and power of test should be considered, | i. | 


Degree of freedom 


The degree of ireedom means ireedom to vary, It is abbreviated as df. In statistical language, it can 
be said that the degree of treedom is the number of observations that are inde on of each 
other and that cannot be deduced from each other, Suppose we have five sess the mean of 
five scores Is 10. The titth score immediately makes adjustment with the remaining four scores in 
a way which assures that the mean of all five scores must be 10. For example ‘oe se we ha 

four scores 12, 18, 5, 12, and then the fifth score must be 3 $0 that the ini al 10 i 
another distribution if the four scores are 2, 10, 8, 5, the fifth score must be 25 in order to kanes 
mean of 10. The meaning ts that four scores in the distribution are independent or they may have 
any value and they cannot be deduced from each other, The size of the fifth score ae is 
fixed because the mean in each case is 10. Hence df =N-1=5-—1=4. Take ani exam le of 
larger Cases. Suppose we have a set of 101 scores. We compute the mean and in com et the 
mean, we lose 1 df. We had initially 101 df (because there were 101 scores) but reseatie: 
iii a ‘ we have N—1=101—1= 100 degrees of freedom. Sometimes we have paired 
7 such cases, the number of degrees of freedom is equal to one less than the number 


Null hypothesis 


pei starting point in al! statistical tests is the statement of null hypothesis (H,), which is a no 
die. st the oo under study. It makes a judgement about whether the obtained 
seth = oe ia ame es are due to some true differences or to some chance errors. The null 
ae - ne ate lor the express purpose of being rejected because if it is rejected, the 
hypothesis ie esis (H,) which is an operational statement of the investigators’ research 
eines ‘ accepted. As we know, a research hypothesis is nothing but predictions or 

tons drawn trom a theory. The tests of the null hypothesis are generally called tests of 


Slenificance the o , : 
ee ely outcome of which is stated in terms of porbability fi Tee 
Signiticance, porbability figures or levels of 





Resear ry Mgotbrewds 1" Bebacroural M HONS 

P snd ocr + 

arenes | 

1 group and the control group jg. 

« Li d . _ . , | . 

the nul hypothes!s, indicating the fact that the sia ‘mal Ih 
| , | 


-or some other chance 
experiment ps | due 10 sampling eal . te and the : ne a long, 6. Me 
_s han Croup> appeal BPE c ONE] eat 

between (ness oA ference herween the expe" | hypothesis, indicati sup 'S tag he 
other hand, if the alte sven eject the nu dig it MINE the facy 4, ate 
. 2k \ ! . e. i] Ti m4 . Fr 

the experimenter T° like among the sampies under Study, a the 


‘ferences ditt 
abiained differences are real 


Hild Tess. Mi 


ween 
i the difference be an zi 
| er is likely to acceP 


TL fe OF nm 
erences between OF 


nificance —, ee 
he null hypothesis 1s developed for the express PSEDOSE Of reject 

: =f ¥f the null hypothesis 1s based upon the level of SIBNiticance whi The 

a bevels of significance are also known as alpha levels, |p syct | 


4) researches there are two levels of sign ificance which are Bical 


Level of sig 
As stated abow 
reyection or ace 
35 a criterion. The 


used 
thon Co 


i Pee | z-| 5 , c l 
sociological and educ eses, One is the 0.05 or 5% level and another is the 0.07 2" 


i AE || hypoth | : 
used for testing the nu oul hypothesis is rejected al the 0.05 level, ii Means that 5 71% 
100 replications of the experiment, the null hypothesis is true and 95 times this hypothesig Ne 4 


he false. In other words, this suggests that a 95% aint at that the obtained results i 
due to the experimental treatment rather than due to some ¢ ee factors, Rejection Of the a 
hypothesis when, in fact, It is true, constitutes 5 ype flies : * 2 error. Thus it can be Said he 
at the 0.05 level of significance, the experimenter Commits a o'r Type | error when he reie. a 

null hypothesis. Some investigators may want a more stringent test and the 0.01 level is ones 

level where the investigator commits a Type | error of 1% only. The 0.01 level SUBsests that a 999, 

probability exists that the obtained results are due to the experimental treatment, and he 

once in 100 replications of the experiment, the null hypothesis would be true. Sometime nee, 
investigator wants even more stringent test of significance and for this, he chooses the 5 _ 
level, which is uncommonly used in behavioural researches. This level suggests that only 0 es 

1000 replications of the experiment, the null hypothesis would be a true one and ie 

replications (out of 1000) the obtained results can be attributed to the experimental treater a 
testing the significance of the obtained statistics, sometimes the investigator accepts = 
null hypothesis when, in tact, it is false, This error is technically known as the Type lhe : 
beta error. io 

__ It is obvious from the above interpretation that an error of Type | can be reduced by putti 

oe an 0.01 oF 0.001 level. But as We reduce the chance for making a type | a 
| e chance level for making a Type Il error where we do not reject the null hypoth . 

when it should be rejected. Therefore, as we decrease the possibility of making one oe ofan 
we also increase the probability of making another type of error. The research workers must Ne 


Cautious in this situation and should try, as f vet a Riess 
Tie levies: ry, a5 lar as possible, to limit the probability for making a 


What is the relationship € 7 

hy wns sit ee ype land Type ll errors? Researchers are of the view that 
dibet malin me aie evels, protecting against one kind of error leads to the 
he slgla laid toe ae pects insurance policy against Type | error (that is, lowering 
committing Type Il error, This han oF 0.001) has the cost of increasing the probability of 
0.001 even if the research Sr ae ma because with a stringent significance level like 
extteme enough to rect ne ei hy) 's true, the statistical results must be quite strong to be 
ener (setting alpha level o op AL a phe safeguard OF Insurance policy against Type I 
cana whetstone soar nkt hacer ming pee 
nt Statistical result enough for elecie’ Tew, baa : ike ; 

7 - nus the trade-off between these 


[WO Conilictine ¢ 

& CONCerns usually | 

(59 : ally 15 put to re | 

9%) 0F 0.01 (1%) Significance ee “Test Dy a “ompromise—formulating the standard 0.05 


{ a4 { J: 1 ; 
Presented in Table 23.9 possibly corr 


level of significance. It 


a 


Cl Or mistaken ¢ 
n =] . x r 
Conclusions in hypothesis testing has been 


Carrying Cut Statisteal Analyses GOT 


Table 23.2 Statistical decision making: Possible correct and incorrect decisions 


Null hypothesis (H,,) 
False 


Null hypothesis iH.) rue] 













Fail to reyect Hy, Correct decision T % I error 


Reject Ho 





Correct decision 


atthe top of Table 23.2, there are two possibilities: H, is true and H, is false. Likewise along 
the left side of the table, there are also two Possibilities: Fail to reject H, and Reject H,,. Table 23.2 
arly shows that there are two ways to be correct and two ways to be in error in any hypothesis- 


c decision-making situation, 


testing OF 
One-tailed test vs two-tailed test 
One-tailed test is a directional test, which indicates the direction of the difference between the 
samples under study. Suppose the experimenter conducts an experiment in which he takes two 
roups—one Is the contro} group and the other is the experimental group. Only the experimental 
group is given training for five days on various kinds of arithmetical operations. Subsequently, an 
arithmetical ability test is administered on the two groups and the scores are obtained. In the 
above situation, the experimenter has reason to say that the mean arithmetical score of the 
experi mental group will be higher than the mean arithmetical score of the control group. This is 
the alternative hypothesis, which indicates the direction of difference. When the alternative 
hypothesis states the direction of difference, it constitutes a one-tailed test. The null hypothesis 
would be that the mean of the experimental group is equal (no difference) to the mean of the 
control group. If this is rejected, we accept the above alternative hypothesis. 


Putting the above facts schematically, we can say 
1. H, =M, =M, (no difference between M, and M,) 


Alternative hypothesis: 


ce: H, =M, # M, 
3. Ay, =M, <M, 
4. Hy, =M, >= M, 


where M, = mean of the experimental group; and 
M, = mean of control group. 
When it is said that the mean of the experimental group will be higher than the mean of the 
control group, we are concerned with only one end of the distribution. Putting it in terms of a 
normal curve, we are concerned with only one end of the curve (see Figure 23.5). When the 
alpha level is set at the 0.05 level, we have a 5% of area of the normal curve all in one tail rather 
than having distributed it equally into two tails of the curve. Therefore, the directional null 
hypothesis is called a one-tailed test. A simple inspection of the table of areas of the normal curve 
given at the end of this chapter reveals that a z score of 1.64 cults off 5% of the area under normal 
curve in the smaller part, and similarly a z score of 2.33 cuts off 1% ot the area in the smaller part. 
If the null hypothesis is rejected, that is hypothesis | is not tenable, we automatically accept the 
alternative hypothesis. If the experimenter has somehow reason to believe that the experimental 
group would have a lower mean score than the control group (alternative hypothesis), he can set 
up a directional hypothesis that the mean of the experimental group is lower than the mean of the 


control group (one-tailed test). Rejection of this hypothesis would automatically lead to the 
sothesis. This time the area of the normal curve in which the 


1%, of the area in the left-hand tail of the normal curve. When 
ailed test, we say that we are rejecting the 


acceptance of the above null hyy 
experimenter is interested is 5% or 
the null hypothesis is rejected by using a one-t 
hypothesis at 1% or 5% points, not levels. 





Li ‘ha Oe 
: fy Aft 
pels and Kron 
yee rt 
Tests. Ai 
je 


a) 


| 
5% 


| 


+ (64 + 


rest at 0.05 oF 5% point 
Ynetailed test al 
Fig. 23-5 | 


+c ane in which the investigator is | nterested tn Valuatin ed 

4 two-tailed test _ jirection of difterence Is of no importance here, The nul etn 
benween the group the experimental group Is equal tothe meait'of the Contro| role hes 
will be that the mean A ae the means ol the experimental Broup and the CONtro| 1th 
sakaiie ome be that the mean of the exper mental group is Not equal one Th 
alremnative hypathes "Thus, we show our concer with both tails of the distribution A GMa. 
of the control BYP pe area under the normal curve equally divided at both the tail ’ tthe 55, 
we" yes mien ot areas of the normal curve it can easily be read that a zs =e 

23.6), From the of areas 


i" 1.2 i = 5 + t a | . i| \| : { or f ; i : 


thene rs M0 ditterence 








2 i | dz 5COre 
ig also cut off 2.5% area in the left-hand tail. In a two-tailed test, a negative ee 9% 
wou oe nthe same way as 2 positive z score. When the null hypothesis js rejected bi re ig 
nae’ test. it is said that we have rejected it at 5% or 1% levels and not at 5% oF 1% Poin 
| ‘ : 
hi 
/ \ 
{ | \ 
/| \ 
# 
- 1.96 + 4.96 


Fig. 23.6 Two-tailed test at 0.05 or 5% level 


There are some problems with one-tailed test and a good researcher must take into 
those problems. One important problem with one-tailed tests is that they allow the re 


reject null hypothesis |H,) even when the difference between the sample and the 
relatively small. In other words, one-tailed tests make 


account 
searcher \g 
Population js 
it loo easy to make a Type | error (rejecting a 
true null hypothesis) Therefore, most researchers consider one-tailed tests as improper. That is the 
reason why two-lailed tests are always more acceptable and generally preferred, 


Another problem with one-tailed test arises from the fact that they look for a treatment effect 
in one direction only. Let us illustrate it with an example: Suppose the researcher wants to 
examine the effect of background music on productivity of the lactory work 
s Measured in terms of Mean performance of a period of 30 days. Further, suppose that the 
researcher develops the one-tailed test stating that background music will tend to enhance the 
mean pertormance of the factory workers, In this case since one-tailed test has no critical region 
on the left-hand side of mean (in normal probability curve) 

decrease in productivity, If the researcher had developer : 
sensitive to a change in either direction. 
oneailed test, 


ers. The productivity 


it would not detect a significant 
, ped two-tailed tests, they would have been 
For this reason, most researchers don’t like to use 
Power of test 


AS we have just Considerer 
: “d, there js an inversed L 
Type | and Type Il error, | ¢ relation be 


n other words, a dec 
RIVEN Sample of NV elements. If the . 


tween the likelihood of committing 
oe iia In Type | error will increase Type Il error for 
researchers wish to reduce both Type | and Type Il errors, he 


must ne revise Ny. Li fhereny Ny [ies rf atatienic 
herween these two ype 7 Orrin be 
very pmiportant The power of 
nypottesis wher, Infact, itis falge 


ihe above interpretation that the 
When a treatment effect exists, the hyp 


effect, then it must have an 80% chan 


rity Chet Sttittticdl Analyses GOO 


Al tests rifle 

t OAining ee oe Persitihity of different balanees 
OEY ta ye MH beatae © tes 

| lat iatys al lee ' wodoraey + 


4 thet rer] a 
CH Meader, in 
pitebyabi| 


iW pervert cA statistic al teed is 


"he puter) plity til frejere Cary the nual 
tan te ated like 


Ir terrrys 
Power e | 


4 ‘yf 
| | pe i | ‘il i Natial ye al best Pe rmeE ||; 


Power of te WHT increase in sye of N. It is clear from 


vt 

Wise and "GN ett rf Type iI ert ms iF rely related 
esting Can haye any one of the tw canchisions 

almeryt eter la 


i ican fail to identif ire 
_ : Type \| PT) 


Of, 
(b) Itcan correctly identify the lreatment 


effect (pe 
i, for example, a hypothesis testi 


, Jecting a false null hypothesis). 

_ hp LU oo P of tanling io identity the treatment 
a | -E OT Correctly yehentyhyy t non Ww ; 
ower of statistical test is determined by | ke eee 


; ~ probability of Type 
Researchers have identified three factors 


- : that tend to atlect 
alpha level, choice of one-tailed or two-tailed t the power of a statistical test 


| est and size of the He A discussion follows. 

(a) The alpha level: Alpha level is the level of necator alpha level is reduced 
the risk of Type | error (rejecting a true Hs) oles mleedtn i evel is reduced, 
aie Ho mince a small alpha level generally reduces the probability of reyecting the null 
hypothesis, it also reduces the statistical power of the tea ica tiline 

(b) One-ta iled vs. Two-tailed tests: As we know, one tailed tests make it very easier to 
reject the null hypothesis. Since o > it very e: 


| , Ne-lailed tests increase the probability 
they also tend to increase the power of the test. 


(c) Sample size: The power of statistical test is also affected by the size of the sample. As 
we know, when the size ofthe sample is larger, i tends to represent the population in a 
better way. If there is a real treatment effect in the population, a larger sample will be 


more likely to locate it than a smaller sample. Therefore. the power of test is enhanced by 
increasing the size of the sample, 


Thus we find that the power of test is influenced by various factors. 


Now, we are in a position to proceed ahead for showing the calculation of f ratio from 


of rejecting F,,. 


different groups. Ordinarily, three types of situations arise while one is calculating the f ratio. 


|. tratio from independent groups 
2. [ratio from correlated groups 
3. tratio trom matched-groups 


Regardless of the nature of the group, t ratio is calculated by the following equation: 
_(M, -M,)-0 (93,7) 
SE, 


[ 


where, M, = mean of the first group; 
M, = mean of the second group; 
SE,, = standard error of the difference between two sample means. 


In each of the above three groups, the formula tor SE, varies and hence, we shall illustrate 
the calculation of the t ratio from each of the above types of groups separately. 


tratio from independent groups: Two groups are said to be independent when no correlation 


exists between them, Suppose one group of boys (N = 20) and one group ol girls (N = 22) were 


administered a mechanical reasoning test, Their data were summarized as given on the next 
page, 





aa! KW 
i Ly) me 
» Rota? nan 





i wyotin wis i” Carrying mat Wertlerte at Arvalyw bit 
verre - ae M, = 40.56, a ae a. ee 
610 7 y= 3456, Mr _—_— niiol tee a 
—4 why eee F styl le~t 
Ni, =e N. ; Se 
Ag = 20 7 Zt) 
sp), - 695 test? Ne ns 
“| ms be ; . 1 PS UF 
SD, = 580 ure ot mechanical reasoning if 28 
4 I ar he _ 7 - 1 te 
cane differ onthe ™ , stratio. For independent groups 
Do the two group si t answered by computing the # ral Bru SE, SD. 420 
can be « ee mation: — é.42 
Th. swe question G4 ; WINE equation: = , ‘ 
bis i a ‘with the help of the sii Coefficient of correlation _ 
may be calculs ch aS a 
nal os SE a + SE 23.8) h | f 
- _ ing the signific “i f 
in case of testing the sig | cance of mean difference between two correlated means. SE, 15 
- between means, _jjated with help of the following formula: ’ ans, SE, 
c= standard erro’ of the difference calcul 
wrere ap = = —_ “anc 
-¢ = standard error of the first mean, and 
SE ig =* 


a cr. ,. — 
“ ven : SEy, = dr, DE, SEY 
d error of the second mean. 


(23.10) 
Stu, = standar das given below: where 42 = coefficient of correlation between the — 
or of mean can be calculated as £ 


| : ae of scores and the final set of 
: »subscrip | | | aie = a 
Here standard err cores The rest of the s ripts are defined like those in Equation 23.8. 





SD , | | - 
a IN-1 (23.9) The standard error of mean (SE) is calculated by Equation 23.9. Thus, 
JN- 
A426 232 
; Ew = =0982, SF, = nae | 
Substituting the value in Equation 23.9, we have oF Mh v19 Me 19 ae 
698 ,- 
f= 5.68 = 1302; SE ae eae m3 =1524 
Ki V19 ° v2 


SEn = SE ny, + = ag 2h SEY SE, 


low bec : equal to: Oo 

Now SE, Omes edud O 3 (0982 + (0.532 is ~2 (080 10982 (0.532 
J r 7 

SE, = (1302) + 0.5249 = 2.004 


= /0.9643 + 02830-08358 =0641 
iM,—-M,)-0 _(34.56-30.56)-0 _ 400 _99¢ 
SEp 2.004 2.004 





t= 


_(My-M,)=0 _(3628-4033)-0 405 _ 


— 


Stp 0.625 ~ 0641 
df =(N,-1)+(N, -1)=(20-1)+(22-T]= 4() 


df =N-1=20-1=19 
Entering the probability table of t ratios, we find that the obtained Fat df = 40 is significant at 
the 0.05 level but not at the 0.01 level. Hence, we conclude that the null hypothesis is rejected 


Entering the probability table of { ratios at df =19, we find that our obtained value of f 
and that there is a true difference between the means of the groups of boys and girls. 


exceeds the value of t at even the 0.001 level of significance. Hence, the null hypothesis ts 
t ratio from correlated groups: The correlated groups are those which exhibit some rejected and I is concluded that the training has produced significant difference between the 
correlation with each other on the given measures, One of the most convenient way of getting the mean of the initial set of scores and the mean of the tinal set of scores. 

correlated means is to repeat the same test on the same group twice on two different occasions. In 
between the initial and the final administration of the test, some experimental treatment is given 
to the group. This is known as the repeated measure t test. Because the group and the test are the 
same, it is highly probable that there will be a correlation between the initial measures and the 


t ratio from matched-groups: Sometimes it becomes necessary for the researchers to match 
the groups under study. Matching can be done on the basis of numbers or it can be done in terms 
of mean and standard deviation. When the matching is done on the basis of the number of the 


subject, each person has his corresponding match in the other group and therefore, the number 
final measures. Suppose a group of 20 students of Class IV is administered an English spelling test af persons in the two matched groups Is always equal. When matching ts done in terms of mean 
in January. The obtained mean score and standard deviations are given on the next page. After a and standard deviation, the number in the two groups may or may not be equal. 
year’s training in spelling, they are again given the same test. This time their mean is considerably Suppose we take two groups from two different classes and each group is compared on the 
raised and the standard deviation is lowered. The correlation coefficient between the initial and basis c cae eile: ° : = : - have been matched in terms of mean and standard 
final set of scores was positive and significant. | asis of numerical reasoning test. Both groups have been mate 


deviations on the basis of scores on general intelligence Test, Do the groups differ in terms ot 


Does training produce a significant difference be wr 
g produce a significant diflerence between the intial mean and mean numerical ability? The data are given on the next page. 


the final meané 





> 


esis, a fet 1 F 


612 / . 
———— 


100 120 
; 0.26 70 
0. Oe 
Means of Intelligence Tests ‘ 
9.98 10. 
SDs on Intelligence Tests 09 
Means on Numeri cal Reasoning Tests 55.62 60.34 
§Ds on Numerical Reasoning Tests 8 67 75 
Coefficient of Correlation between 
General Intelligence Test scores 
and Numerical Reasoning Test scores 0.45 


The equation for calculating SEp in case of groups matched in terms of means and SD is 








given below. —_ 7 
i 2 2 
SE, = (Sex +SEy, ya-r (23.11) 
Where, subscripts are defined as usual. Now, we Can proceed as under: 
8.67 7.98 
= ——— = 0867;SE\, = = 0.691 
SE ny 00 0.867; 504. 120 








z 


SE, = ((0867)? + (0691)] (1-0.45° 


— 


—<———_—___ = 


- (0.7516) + (0.4777)(1-0.2025) 


~ .(12293)(07975) =0991 
_(M,-M,)-0 _ (55.62-6034)—0 
SE), 0991 
=e = 4762 
df =(N, —1)+(N, -1)-1=(100 —1)+(120-1)-1 
=(99 + 119)-1=218-1=217 


Entering the probability table of ¢ ratios at df 217, we find by interpolation that the obtained 
value of t exceeds the f value required at even the 0.001 level of significance. Rejecting the null 
hypothesis, we conclude that the two groups differed significantly in terms of numerical 
mean ability, 


2. Analysis of Variance: F Ratio 


The { ratio or z ratio is one of the powerful parametric tests through which we can test the 
significance of the difference between two means. There are two general limitations of the t ratio. 
First, when there are several groups and if we want to test the significance of the mean difference 
among them, several t ratios are required to be computed, For example, suppose there are five 
groups. Then we need to compute — 


MN=1) _5%4 5 at 
3 9 = ratios 


Carrying Cnet Matistical Analyses 613 
which is, of course, a cumber some job. Second, thet ratio 
ny its ctatistical analysis. The variati 
— groups. Such variations ar 
3mo 


ons in the sco does not ac count for interaction effect 
60res May be due to the interactions taking place 


€ nota , 
ations we turn to analysis of vari Scounted for by t ratios. in order to rernove these two 
passes ance Oiginally developed by R A Fisher. Analysis of 


_ance is a class Of statistical techniques th ‘ Ray : 

yarian ‘lsanctaksiinanedl| 4 rough which we test the overall difference among the 
rwo or more y More than two) sample means. Analysis of vari 
| n types: Simple analysis of vari ans, Analysis of variance is of two 
common ae enor two nance or one-way analysis of variance and complex 
analysis of varian ; ie 4 analysis of variance. Analysis of variance (of whatever type) is 
often referred to by its contraction, ANOVA. 

Before we take up discussion on ANOVA, a fundamental 
-oncept of sum of squares must be properly understood. Let us 
meaning of the concepts of sum of squares, Suppose we have co 
of students concerning their attitude towards legalization of a 
group “ Ol students had never taken that drug and each member of the group B had taken this 
drug at least GALE I's week, A high score on the attitude scale indicated that students strongly 
fzvoured legalization. Here we want to test the hypothesis that there is no difference in this 
attitude towards legalization because two groups of students have been taken from the same 
population. Table 23.3 presents the scores of Group A and Group B towards the legalization of 
drug. 


concept of ANOVA, that is, the 
take an example to illustrate the 
nducted a study with two groups 
(banned) drug. Each member ot 


Table 23.3 Group A and Group B attitude towards the legalization of drug 





Group A | Grous B 
x, 2 
3 64 
2 100 
3 144 
8 100 
Sum: 16 408 
N, =4, 
La oa, Met nae 
4 4 


UX votal = 16 + 40) = 56 


Nica — a = ri 
8 


In Table 23.3, the mean for Group Ais 4 and the mean for Group B is 10. The overall mean 
is 7. For obtaining total sum of squares (55,9) it is required that from each individual score the 
overall mean must be subtracted, squared and added. Thus 


SS = EX = * ill (23.1 2) 


For the data in Table 23.3, the total sum of squares will be then 
SS yea) = (3-7 + (2 _7¥4.(3-7) 48-7) + 8-77 + 10-7" + (12-7 (10-77 


lola 


= 164254 1641414942549 = 102 
total) 'S the sum of squared deviations from 


In this way, we find that the total sum of squares (SS sum of 
the distribution. 


overall mean and therefore, is a measure of total variance of 





. pehavioural ScIences 
rements and research Methods 1" Beb 
614 Tests, Measu . | 
tal sum of squares is through raw score equation as under 
ting tola 
Another way of calcula 


2 
(EX rota (23.43 
SS tral =% x ot 7 N ; 





Z 
7 eee 

the total sum OT 5 | 3 wae ithj 
ANOVA), roup sum of squares (SS,,). The within sum of square 

1 ent variable scores within each c 
: it is the measure of variability of the dependent variab’e ‘ i. donb - “alegory o 
compere’! 4 t variable and therefore, is the variation int ie pe _ Varlable that can't 
ie : | ineanenended or treatment variable. the squrenen “ig coe “arlability 
ee ‘ | the variables influencing dependent variables other t as independen Or 
Hoscneas yattable Thus all within-group variability must eo from a ge other than the 

, : ithin group sum of squares, in the above examp) 
d independent variables. The within gr | sees : ple, 
celia . ai of the squared deviations of scores IN Group A around X1 plus the sum of the 
squared deviations of scores in Group B around X2. 


For the data in Table 23.3 the within group sum 


quares |s divided into two parts. ” 


In analysis of variance ( , 
proup sum of squares (S$,,) and between g 


of squares may be calculated as under: 


3(X, —X,)? + D(X, — Xz (23,14) 
a(z=4)42-4"+0 _4)°+ (8-4)? + (8-10) + (10-10) + (12-10)? + (10-192 


=1+4+14+16+4+0+4+0=30 


Now, raw score equation for within group sum of squares may also be used to arrive at the 
same result, that is, 30. 





i a 86 (23.15) 
Z 
% en — 
N, 
=p 26@-65 99 
4 
65, =5x7- 240! 
= . N, 
= 408-40" ~ 408-400 =8 
4 
5S,, =22+8 =30 


The between sum of squares (SS,,) is the sum of the sum of squares between each group and 
directly reflects the impact of the independent variable or treatment variable. In other words, the 
between sum of squares component is a measure of the variation between the groups of the 
independent variable and therefore, is the variation in the dependent variable that is attributable 
to the independent or treatment variable. It may be obtained by subtracting the overall mean 


from each group mean, squaring the result, multiplying by N in each and finally summing across 
all the groups. Thus | 


555 = =MX, ~ X roa (23.17) 


where, N, is the number in the ith group and X, is the mean of ith group. 





Carrying Out Statistical Analyses 





The raw score formula for calculatin the b 
(LA 19 (ZX cal” : etween-group SUM of squares js: ’ 
$5, =% rt \ 
; N 
(23.18) 





(16)" | (407 |_ (56) 
ri 4 |g ~(64+400)-392 = 464 -392 =72 
However, most simply SS, can be calcul 
SS; = 55,0155, | ue of SS,,,from Stor: 
=102-30=72 — 


In simple analysis of variance, there is on! 
classified into several groups on the basis of thi 
one independent variable, the simple analysis 
Such ANOVA Is suited to the completely rando 
or more than two independent variables, whic 
ANOVA is suited to factorial design. 


ated by subtracting the val 


y one independent variable and the samples are 
S variable. Since the basis of Classification is only 
of varaince is also known as one-way ANOVA. 
mized design. In complex ANOVA there are two 
h form the basis of classification of groups. Such 


Statistically, the F ratio ts calculated as follows. 


_ Larger variance Between-groups variance 


Within-groups variance 


F 





Or 





Smaller variance 


Between-groups variance refers to variation in the mean of each group from the total or 
grand mean of all groups. Within-groups variance refers to the average variability of scores within 
each group. The theme of the analysis of variance is that if the groups have been randomly 
selected from the population, these two variances, namely, between-groups variance and 
within-group variance are the unbiased estimates of same population variance. The significance 
of difference between these two types of variances is tested through the F test. 

Anova has some assumptions which should be met. 

(a) The population distribution for each treatment condition should be normal. In other 
words, there should be normality within groups who have been measured on dependent 
variable, 

ib) The individuals who have been observed should be distributed randomly in the groups. 

(c) The dependent variable should be measured on interval scale and independent 
variable (s) should be measured on nominal scale (Elifson 1990) | 

(d) Within-groups variances must be approximately equal or homogeneous. This is referred 
to as homogeneity of variances. There are ordinarily two popular methods of 
determining the homogeneity of variances: Bartlett’s test of hamogeneity of saicolaea 
and Hartley’s F-Max test, For Bartlett’s test, readers are referred to consult et 5 (1982) 
Experimental Design, Here we shall concentrate upon Hartley s F-Max test only. , 

Harley's F-Max test is based on the principle that a sample variance mings a 
estimate of the population Variance. Therefore, if the sil aes aie init 7 
sample variances should be ordinarily very similar. The procedure a 
under: 

(a) Complete sample variance for each of th 

SS _ 55 where §S is the su 


computed by aa = w 


e separate samples. The sample variance Is 
m of squared deviations of each sample 


. xy 


=, 


separately and calculated by 2X" ~~ 








in Bebat ioural Sciences 
+ Metbous i 
ppesearch : 


seoreanents ait 
sets Meastureme 
616 Tests, + 


ib) Select the largest 
under: | 
Langese sample variance 

F-max= <allese sample variance 


latively large value of F-Max test indicates that there exists a large diff 
A relatively 


le variances. In such a situation, the data suggest that the population Variance 
sample V Ins 


) 5 

and that the assumption of homogeneity of variances pase violated, On the othe, t 
mall value of F-Max test (that is, near .00) indicates that the sample variances ates 

5 


With 
example, an 
Suppose there are three independent samples, each having n= 


12.66, 10.78, 11.79. For these data, ate 


Largest sample variance — 12.66 _ | 
F-Max = ES SE ed. 
Smallest sample variance 10.78 


The data do not provide evidence that the assumption 


of homogeneity of variance has bee 
violated because the value of F-Max test is near 1. ni 


There is also another way of reaching a at decision about the assumption of homogenej 
variance by comparing the value of F-Max test with the critical value provided in Tabj “Hy Of 
using this Table, we need to know k = number of separate samples, df= n-1 and alph ee 2 
predetermined by the investigator. For the above data, we have k= 3.df=n-1=10 Be evel as 
value of 5.34 at the alpha level of 0.05 and a value of 8.5 at the alpha level of 0.01 | i 
Since the obtained value of F-Max test is less than 5.34, we may conclude that the ‘ ii 
homogeneity of variance is reasonable. Had the obtained value been larger tha Pee ot 
in the Table, we would have concluded that the homogeneity asumetinn is sc i. = 
For an illustration of sim Vsis of Varianc ith i | 

cuthbert 
achievement test, and ANOVA has been calculated fro this en ea 


se scores, 


Grand sum (EX): 561 4 154+ 312=1027 


Grand sum of squares (2X2): 386] 1+26104 16070 = 57291 


Step 1: Correction (C}- (EX)? (1027; 
ction (C); =A" _ 1027)" 
N 390. 2157.63 


Step 2: Total sum of squares (TSS) :F x2 ¢ 


= 57291 ~ 3515763 
= 2213337 


nd the smallest of these sample variances and Compute 
a Mr, 


(23.29 


ren 


Mo 
dre diffe Ng 
heed: 

! re: Mila 
therefore, the assumption of homogeneity of variance is reasonable. Let us illustrate . t ang 


Carrying Out Statistical Analyses 617 


Table 23.4 Simple ANOVA based on the hypothetical scores of three groups 


Gr. A Gr. B Gr. C Crh AG =a Gre 
xX x x x? x? x? 
~~  t« i ao. = on 
58 a 58 3364 196 3364 
40 3 65 1600 81 4225 
30 bs f2 300 144 5184 
10 28 10 100 400 100 
65 25 9 4225 625 81 
88 12 8 7744 144 64 
87 18 10 7569 324 100 
80 14 \6 6400 196 256 
25 20 14 625 400 196 
Sum: 561 154 312 38611 2610 16070 


Step 3: Between (or among) sum of squares (BSS): 
(EX: EX), EX) 
nh Ay n 


(S61 (154) _ B12 
10 


a 


fl 


- 3515763 





a + 
10 10 


_ 3147214 23716497394 _ 3515763 
10 


= 43578.10 —35157.63 =8420.47 
Step 4: Within sum of squares: TSS—BSS =2213337 -8420.47 =1 371290 





Summary: Analysis of Variance 


Mean square or 


Source of df Sum of squares ae 
| varianc 
variation 
235 
Between-groups b41=9-122 8420.47 4210.2 
2! 507.885 
Within-groups N—-k=30-3=27 13712.90 
Total N= 1=30-1=29 22133.37 


Between-groups variance _ 4210239 _ 3989-829 


r ae 0M ie | 1 yell oT | 4a 
a if 1 Pia | 
| ry Fe m | 


My | fi ff 
at Peed 
jest, Measurements and At 


=30 = 1= 255 ir 
mm, we have N= 1=. Vall, 
10 cases in the above Pe ied one. Since there are three ty Op 
Since there ae r - the number ol ess within-groups ic equal to the total Ru 
between-groups Is € ‘ , tee 1=2, i 


hence df for between-groups Is . | ) Hence, it is equal to N — K=30- ‘ = 27. Alter Caleular ) 
of cases minus number of groups Sad shee: sulitt of squares tor ear h of the "hree SOUICEs ‘ 
the number of degrees of —o or variances, which are obtained by dividing each of tl Q 
Variations, we ompute bac cena of degrees of freedom. These two lee of Variances are 
sum of squares hs aie ta sl \We obtain F ratio by dividing the detween-groun. 
the estimates of the por on 
variance by the within-groups vanance, ble (Guilford & Fruchter 1978). In the F Table 

F ratio is interpreted by the use of the F Table is written at the top and the number 


; ee y square (df,) 
ees of freedom for greater Meal ramet altho 
uribes of agrees @ for smaller mean square (af) is written on the = ny side. For this 
—— li ai ane Locating at these dfs, we find that the required F ratio at the 0.05 
problem, at, =2 and of, =2 


level is 3.35 and at the 0.01 level is 5.49. Since the obtained ee of F ratio. IS ie , Which 
Been gar oe reject the null hypothesis and conclude that there is an overa difference 


lucation: i a 
between the three groups of subjects on the educational achievement tes 


618 


Tests atter the F test (Past Hoc tests) — 
One general limitation of the F test is that it only tells about the overall difference between the 


groups under study but tells nothing about the location ot sae ie “ ee irises in 
the above problem the obtained F ratio of 8.29 is significant, whic The siglo ee ttiets 
is significant difference between the groups under study, but whether the significant difference is 
between A and B or A and C or 8 and C cannot be said. Therefore, when F ratio is Significant, we 
need some additional tests after this F test. 
in fact, in ANOVA the null hypothesis states that there is no treatment effect and therefore, al| 
sample means are the same. Although this appears to be a very simple conclusion, in most Cases, 
I Creates many problems. For example, suppose there are only two treatment gr Oups in the 
experiment, null hypothesis will state that the mean of one treatment group does not differ from 
the mean of the other treatment group. If this hypothesis is rejected, the straightforward 
conclusion is that the two means are not equal. But when there are three (or more) treatment 
groups, the problem gets complicated. With k= 3, for example, rejecting null hypothesis 
indicates that not all means of the treatment group are the same. Now here, the researcher has to 
decide which ones are different. Is Mean is different from Mean,? Is Mean, different from Mean,? 
Is Mean, is different from Mean .f Are all three different? The purpose of post hoc tests is to answer 
these questions. 

Post hoc tests are done after an analysis of variance has been carried out. These are called 
post hoc comparisons because they are done after the fact and not planned in advance (Aron, 
Aron & Coups 2006). in general, a post hoc test enables the researcher to go back through the 
data and compare the two individual treatments at a time, that is, in pair. In statistical terms, this is 
called making pairwise comparison. 

| Although there are different post hoc test procedures because there are many different ways 
of controlling the alpha levels, these tests can be classified into two broad Categories: a priori or 
planned comparisons and @ posterior! or unplanned comparisons. 
bai See ache wine the specific treatment conditions that are identified 
eae ii Pia ~s ests generally make little effort to control the alpha level. 
hid heen ana ri icin oi when the overall F ratio is Not significant. The rationale 
which is Contained within a | = ie nn, small pi soins: (Comparing isi cnsalenerea 
Ns orion of experiment i iseasomable wear esa has specific hypotheses for 

onable to test its result. separately, 


“arry 
WIND Chat ‘Wettisticeal A nical yes G19 


A posteric WH TESTS Are Used for Treatment ¢ ompar} 

i anys 
tart of the CXPCHMeNnt, In fact these Leste thy i OMS th 
she ; 


qjusiments for the number of differen POtENtial e¢ 
il é ; . 


- vetified when F ratio from over all ANicy 
i5 justi n can still be made to si i ANO “A is UPN icant Mt the overall F ine 
compallsern © siessis . SND check the difference me hiest “a AOL sipnificant, 
| 4 Mipwar i i * TWA Ente i c ti 
he noted thal 2 al mane here, are MOre Dowerfy| hans bd Hi ans. may, however, 
is significant and the ree. 7 ul than post h, are 
overal! i ‘al i st becaus : ‘ai archer js, USING post hoc fi hy ee a as dis 
conventiona ause e alpha level ay which he Would be testing hould not use the 
higher than the alpha level originally Specified, This is likely ; = Testing will be numerically 
ignificant differences pp. | ee 
obtained significant d Srences may be due to chance factors. | may also be clear j 
T : "rasce . a0 DE Clearly note at : 
the number of Comparisons increases, the Probability of Type | error alse Sermon’ eee as 


A variety of ie oo are available. For g BIVEN Situation different tests rp be used and 
} | , en tion, e 5 May be used an 
the outcomes of these different tests may be different. Statisticians who are work im with these 


me Wig tera ete ieee amin regarding the appropriateness 

. im : : Ne -Key eae 
penected t test, Tukey’s Honest/ y Significant Difference (Hs s Hh pee sages al vin 
two most popular post hoc tests are Tukey's Honestly Significant Ditference (HSD) hates 
Scheffe test which will be discussed. Tukey HSD test allows the researcher e-conpite a single 
value that determines the minimum difference between treatment means that is seman 
essential for significance. This value is Called as the Honestly Significant Difference or HSD 
which is then, used to compare any two treatment conditions. |f the mean difference exceeds 
Tukey’s HSD, the researcher concludes that there Is a significant difference between treatments. If 
the mean difference is less than Tukey’s HSD, he concludes that there is no significant difference 
between the treatments. The formula for Tukey’s HSD is as under: 


HSD = ay Sain (23.21) 


at Wey ge — ae 
Contra the Net ne Cessarily planned al the 
mai Ne overal| alpha level by m aking 
Fa or mn oe ce 
NS In the Experiment, A posteriori tex 


> happen Decause sore uf the 





where, MS, iin = within treatment variance 
n= number of scores in each treatment 
q = studentized range statistic. 
For further details of Table of Studentized Range statistic, readers are referred to consult 
Gravetter and Wallnau (1987, A-35), 


Taking example from Table 23.4, we can calculate HSD for making three post hoc 
comparisons and determining in which pair(s) the significant mean difference actually lies. 


From Table 23.4, 
n=10 


K (or number of treatments/columns) = 03 
Alpha level for q=.05, df for error term =N-k =30-3 = 


27 


q = 3.51 (by interpolation from Table 23.5) 


10 
= 3.515079 =3.51%7.13 = 25.02 


[507.885 
RES Ialy aes 





gethouds Bebavioural Sctences 
7 rb A cre 


| . ynclusions: 
ke the following COnc fs 
an ioe _ 56.10 — 15.4 = 40.7 (Significant) 
sa C= 56.10—-31 2 = 24.9 (Not Significant) 
ae g = 31.2 — 15.4 = 15.8 (Not Significant) 
ang eee" 





- between ds 25.02, it is regarded ag «i 

difference B only exceeds hey 5 SI 
3. Mean - patwieen Aare A pees C and B and C are less than 25 
5, that !5, ; ss 


02, so th The 
his © | een any two samples pj, ya 
other two mean oi aero the mean difference betw y i ‘ 
ign In this / 
ot significant. I ss 
25 rea be significant. 


BNifican 
Since mean differen 


5 Aselected portion of the value of studentized range statistics (q)" 
a“ (Light face = 0.5; Bold face = 0.01) 


kK = Number of treatments 
dffor | 







error term 


30 


Following the Scheffe technique also, we can locate the 
Since there are three groups, three comparisons are likely to 
B vs C. Following the Scheffe technique, we are 
Equation 23.22 for each of three BrOups. 


difference between three means 


be made, namely, Avs B.A vs Cand 
required to compute the F ratio with 


the help of 
p= —__'M —-M,) 


~ SDAIN, + NW NN5 
Now, three Fs for three pairs of distribution can be 
F ratio for distributions A and B: 





(23.22) 


calculated as follows: 


p — §56.10-15.40)? 
en7nae 0+ 16) = 1631 
307 885 - 


MO10) 
F ratio 


for distributions Aand c: 


{56.10 — 3120/4 
(10)(10) 
m Abstracts 


f | B E. 7 a i Ls ij 
cf fr JP T bhi ih oT I hse cs i af Ca i | iF CAT al ia LL 


The, bry FS Cravetery & LA Walloau (1987). 


Carrying Out Statistical Analyses ©21 
f ratio for distributions B and C; 
vali 


(15.40-3120P 
SSR Si +10) > 2.46 
507.885 — 


(10)(10) 
ointed out earlier, F at the 0.05 level of significance for df, = 2 and df, = 27 is 3.35. This 
fe multiplied by k —1, yields (3 — 1)(3.35) 
yalue, 


= 6.70. Only the F ratio for distributions A and B is 
than 6.70. Hence, it is concluded that there is a significant difference between the means 
reater 


ee and B only. The mean difference between A and C and B and C is not significant. The same 

anne ion had been arrived by using Tukey’s Honestly Significant Difference Test. 

cane r as the computation of the two-way analysis of variance is concerned, the reader is 
ae Chapter 21 where details of statistical calculations have been shown. 

ee } sis of variance (ANOVA) may be univariate analysis or multivariate analysis. In 
Ana 7 nuitysis there is one dependent variable and there may be more than one independent 

nage multivariate analysis of variance, there are more than one dependent variable. 

variable. * 


lly dependent variables are different measures of approximately the same thing such as two 
Lisua 
different rea 


ding ability tests. This is called as multivariate analysis of variance (MANOVA) wh ich 
‘erent from an ordinary analysis of variance because in it (MANOVA), there is more than one 
‘ slate t variable. When the researcher finds an overall significant difference among groups 
mia ANOVA, this means that the groups differ on combination of dependent variables. 
we ently, the researcher proceeds to know whether the groups differ on any or all of the 
Seeennent variables considered individually. In this way, MANOVA is followed by an ordinary 
paliraben of variance for each of the dependent variables. In the present text, no attempt will be 
ate to illustrate the computation of MANOVA. a 
' Likewise, the analysis of covariance in which there are more than one dependent variable is 
called as multiple analysis of covariance (MANCOVA), MANCOVA differs from an ordinary 
analysis of covariance (ANCOVA) because in the former, there are more than one dependent 


variable where as in the latter, there is only one dependent variable. In the present text, 


MANCOVA will not be illustrated with numerical examples. However, ANCOVA will be 
discussed in detail with numerical examples. 


3. Analysis of Covariance (ANCOVA) 


The analysis of covariance (ANCOVA) was developed by R A Fisher and the very first example of 
its use appeared in literature in 1932. Most of these applications are from agricultural 
experimentations, A clase examination of these examples in agriculture also helps to clarify its 
application in the behavioural sciences. Analysis of covariance, a most widely used elaboration 
of analysis of Variance, is a technique in which indirect method or statistical control is employed 
to enhance the precision of the experiment. In its procedure or methodology, ANCOVA may be 
equated with partial correlation where the researcher seeks a measure of correlation between 
two sets of variables—dependent and independent—by partialling out the impact of third or 
intervening variables. In ANCOVA the researcher also tries to partial out the side effects, if any, in 
the experiment due to lack of exercising proper experimental control over the intervening 
variables or covariates (or covariables). Each of the variables controlled for (or partialed out or 
held constant) is called as covariate. In ANCOVA, the statistical control is achieved by including 
measures On Concomitant variate (X). This is uncontrolled variable, also called as covariate and is 
Not itself of experimental interest. The other variate which is of experimental interest is termed as 
the criterion and designated as Y In this way in ANCOVA, the researcher obtains two 
Observations (X and Y) from each participant. Measurements on X (covariate) obtained before 
BIVINg treatment are made to adjust the measurements on Y (criterion). When X and Y are 
“ssociated, a part of the variance of Y occurs due to the variation in X. In fact, ANCOVA is a 
method of making adjustments that are linked to the problem of correlation. Here the problem is 
fo specify how much of variation of Y can be predicted from variation in X and then subtracting 





. iy Bebat journal SCIences 

{ pesearc? etbods int Be 
aremen® and Ne 
: . _ variall nas and justed value. A formula tort 
ing (or leftover a . “ mee 


e remaintt 1) differences In 
this to obta"" ° cores for initial differ 
s ting "> 
of correc 


-_— 


6s, .= 55) 


heny 


7 (23 
| py! SS, ang =| | dy 
: when variability contributed by X has beep COntrgy| 


I 
- equares of ¥ 
here, S9y.x ~ sum ol squares . 
yhere, y. 

removed: 
mots 
gs, =sumo 
cp =sum of products of dev 


£ y scores 
6 =m fa hat ANCOVA is a method of analysis that eng 
: can be 5a 


j } ble 
: ‘the groups in terms of relevant known... the 
| ate pre-experimental ene a eine! statistically so that i, Variable, 
investigator t0 a“ initial status of i oe squated. The scores shachava hee, ; “ crn 
Differences 1" initial status had beer na nid N Correct, 
compared as though Ov al ly known a> residuals basa ey ; Nil FEMAINS after 
this procedure c ea vomrected OF removed. Results A in ate at Rca analysis 
inequalities have DE ior exception. While reporting results, ins 8 g the Means of 
a a ney give the adjusted means, the means OF Cach Group after adjusting , 
oup, the resed eee 
e taling out the effect of the covariates. = 
The assumptions of ANCOVA are as under. 
(i) The X scores (covariate) are not affected by treatments. 
ie ie asurement should be normally distr) 
(ii) The dependent variable that is under measuremen y distributed 
population. pees oT | 
. Fale ores (Y) on initial scores (X ) shoul 
i) The regression coefficient ofthe final score lations between Xand Y Uld be more oy 
less same in all groups and is linear, that is, correlations between A an SCOPES afe same 
for all groups. | 
liv) Treatment groups are selected at random from the same population. 
(v) Within-group variances should be approximately equal. 
(vi) The contribution of variances in the total sample should be additive. 


cores 


x6 of Y's 
quare ations of X and Y 


In the 


Numerical Example | 


Three groups of five students each were randomly selected from class X of a school. They were 
then, rated for their leadership quality and their scores were obtained. Subsequently, they were 
subjected to three different training techniques for improving their leadership qualities for a 
month, After a month of training, they were again assessed for their leadership qualities and their 
tinal scores were also obtained, The data so collected are presented in Table 23.6. 


Table 23.6 Measures on a covariate (X) and a criterion variate (Y) for single factor experiment 





Group A 


Group B 
Initial scores 


Group C 
Initial scores 


Initial scores —- Final scores 
Y; 


Final scores Final scores 


3 
7.0 10.6 6.0 10.6 


Mean : 3.4 


Carrying Out Statistical Analyses 623 


he above data calls for using ANCOVA. There ma 
“1 «cores of the three groups because no att 


y be many observable differences among 


: EMpt was made by the researcher to make 
the [0 oe groups equivalent at the start of the study, : y 


hes jferent re atments. In the absence of such an ex 
ree cise statistical control by using ANCOVA, 
to st by arranging the above data as under (cf, Table 23,7). 
sta Table 23.7 Computational arrangement for ANCOVA 
xy % OM) XY X, 
3 «CG 18 9 36 


Group B 
5% Ol). X3 





8 10 80 64 
4 8 32 \6 64 8 8 64 64 
¢ +5 25 25 25 5 12 60 25 
3 +6 18 9 36 9 i090 90 81 
—_ 8 4 16 5 13. 65 25 
oe ee dn 355359259 
weans34 5:8 70 106 
Group C 

KF ORY ORE a 

7 14 4-998 49 196 

4 4 16 6 16 

5 12 86600~=6(25)Ss44 

6 13 #78 3% 169 


8 10 80 64 100. 
Sum: 30 §3 332 190 625 
Mean: 6 10.6 
SX =174+35430 =82 
EY =294+53+53=135 


Steps of computation 


A. ANOVA of X scores: 
EX) (17435430)! (62) 
(i) Correction (C,) = = = nied aia 2d = —— = 44827 
N 15 15 
(ii) Total sum of squares (S$,,)=ZX*7-C, 
= (63 +259 + 190)—44827 =512-448.27 =6373 
(ZX, (EX, ‘ (EX,)° 
NN, ON, 


(: 


a4 


(iii) Between group sum of squares (SS ,,,) = 


(17? 352 307) 
= tte 
\ 5 5 a 


— 44627 





=(578+245+180)-4487 = 4828- 44827 = 34.53 


(iv) Within group sum of squares(SS,,,)= Total 55,— Between S5, 
= 6373-3453 =292 


that is, before subjecting these groups to 
"experimental control, the researcher is forced 
The computational work for ANCOVA may be 





2 

100 
64 

144 
100 
169 
577 





in pehavioural sciences 


adethy is 
jest yeasuremen!s and researc? Metbox 
ests, Me 
. Table 23.8 Summary of ANOVA of X scores 
table 23: | 
MS 
55 df . 











Source of variation 













Between or, ah 99,2 N-k= 15-3 7.10 
we roups (ert 53.73 . a 
a idissdeat df 2,12 

ye aq =7.10 F at 0.01 6.93 

i 2.4 


g. ANOVA of Y scores: a 
sy? (29+53+ 53) _ 1139) 4215 


—_———— 


a 


(i) Correction (Cy) = . 15 15 
J enY,-C =(177 + 577 +625)-1215= 1379-1215 = 164 


(ii) Total sum of squares (55 y 
aye 22 BYE 


L — 
(iii) Between groups sum of squares (SSy,) = a arma G, 
o2 £32 «532 
a 
5 5 5 


~ (1682+ 5618+ 5618)-1215 
= 12918-1215 =768 
liv) Within group sum of squares (SS ,,}= Total SS, — Between SS. 









= 164-768 =8/ 2 
Table 23.9 Summary of ANOVA of Yscores 
Source of variation SS df MS . F 
Between groups 76.8 K-1=3-1=2 36.4 
Within groups (error) 87.2 N=-k =15-3=12 7.26 5.28 
Total 114.94 14 
F at 0.05 3.66 
fat 528 | df 2,12 
i 726 F at 0.01 693 


C. Analysis of covariance: 
(i) Total sum of products (SP, ) = (ZX\Y, + BXLYo + EX:Y,)- 
=(1014 359 +332)- _ 


DX-ZY 





=792-738 = 54 


(ii) Sum of products between groups (SP, ) 


a an } (Paz) ‘ ? X,Y, | 2 e ZY | 














N N N, | \ N 
_ (17x29) | (35x53) | (30x53) _82%135 
5 5 5 5 


= 98.6 + 371 + 318-738 
= 787.6—738= 49.6 


Carrying Out Stat 
ithi yO Ma Stical Ang 
(iii) SUM of products within groups (Sp, ) — op - lAnalyses 625 
eth 


; = 94 -49.6= 44 
For convenience, the sum of products (SP) m, 


: y be arran ta 
q Y variates 4 under bed with sum of squares for both X 


qable 23.10 summary of sum of squares of X oa 
: Y as well as that of sy 
um of products. 


oom Within Between 
Gym of products (xy) 54 within Between 

m of squares (X) 63.73 _ 49.6 

ou ae 29.2 445 

gum of S vares (Y) __ 164 4.53 






7.2 76.8 


(iv) Adjusted sum of squares for (SS, .) 5S, (9p? 


) 5S. 
=164-4% _, ° 
6373 164-4575 =11825 


(y) Within group adjusted sum of squares (SS. ,. ) 
Se 
BS 
4.4? 
292 
(yi) Between group adjusted sum of squares (SSy -x,,) 
= Total adjusted $$—Within adjusted $$ 
= 118.25 - 86.54 = 31.71 


=55 





= 872- = 672-066 = 86,54 


Table 23.11: Summary of covariance analysis 


Source of variation $5 df MS F 
Between groups 31.71 k-|=3-1=2 15.85 
Within groups (error) 86.54 k(n-1)-1 7.86 2.01 
=3(5-1)-1=11 
Total 118.25 13 
1585 398 (005) 
Fy = 7a, =2 0¥ (df2,11) 720 (001) 


Steps of computation 
Step 1: ANOVA of X scores | 
in this step, ANOVA of X scores has been carried out separately. The outcome of ieee 
is summarized in Table 23.8. The outcome, that is, F ratio shows that the three groups ) i 
significantly. F (df= 2, 12) =7.10 is more than F value of 0.01 level, so it 1s significant at 0.01 level. 
Step 2: ANOVA of Y scores 


In this second step, ANOVA of Y scores has been worked out separately. The sprint vie : 
ANOVA of Y scores has been presented in Table 23.9. It's clear that the three groups a's 


1 dfis lost because of regression of Yon 4. 


fr Rebavioural Sciences 


d F that is, 5.29 exceeds the value of 


Methods Carrying Out Statistical Analyses G27 





4 pesearc? 
atts cartel ROSES : 
i. obtained F exceeds the critical 











aa Fn 
626 Tests, hee taine : : Ince Value of : 
ne criterion measur= ice groups differed on ait Y (criterion) oe is rejected, we conclude that the male As, ‘09 level for df 1 and 13, Hy (null 
sin ca again itis concluded that a ee ieee and Yscores are significantly associated. | on X is significant. This shows that 
phere’ arianc | roducts - t | ; 
sia 3 - Analysis efcovare Yad out. First, the total S ccineondnte dui thas bee ” general, when the correlation between ¥ and Y cores is high and 
See ACO one i products has been 3 a ieee ly, computations hava ee COVA will often eas a ieee F, In the above example the eisetleticn jailed a 
4 and this sum oF = thin proups. - Peer ae una Sp : ef. =i) and correlation ) . 2 : n between X an 
calculate? a and sum of products ie differences in X scores. The symbol SS). has be. : yis not es ic. Fior ANCOVA is nct shennan means is quite high (r, = 0.96) and therefore, the 
ee ET aurpore of correcting hertgers variability in Y contributed by X scores OF th outcome, t at Is, ENnificant 
he meee to the adjusted A ppnvenienice, the adjusted sum of squares of %, adjusted SUM r= A i 44 
USE held constant. For ducts (XY) have been summarized jn Table ™ A 292x872 504 = 0.08 


arte l ares for pro 
ee of Y and anaes palace: = 2.01 is not significant sons saver Thus analysis 
23.10. It is observed that tne does not show a significant difference between the groups. Here : - : 
variance for adjusted a ath the conclusion that the groups don t differ significantly afte, — gp : = ae os 
Ho (null sabre ed ssi of the three training techniques, no one technique can be said tg be = SS a 
giving treatments. Fence, | 


better than another. 





ample, it is clear that the three groups diffe, so far as this example is concerned, the matter ends here. But suppose the obtained F of 


re ee erical ex apes 

we Bey ileal i ee las Y scores. However, when Y scores are adjusted t ANCOVA becomes significant, and therefore, Hy is rejected. In such a case, our objective then 
significantly tit TS onween two variables, the groups no longer differ sign ificantly. It appears will be to proceed ahead and analyze the differences in final scores after correction or adjustment 
account for aoe pails difference of Yscores was simply a reflection of differences in X scores and conclude which one of these treatments is better than the others. Hence, another numerical 
eee ccicn us find if regression of Yand X is really significant. ‘ example follows. . , 


is, t ression Yon X. Let 
that is, there is regression ¥ | | 
F = " : : ar ; ‘ ; & twa. op | 
The total sum of squares of Y measures are divided int mponents—one 


responding to those parts of Y scores which can be predicted from the X scores and the other Numerical Example 
correspont f ¥scores, which are independent of x. In this second example, we are going to discuss a different épietimenit where the groups are not 


(residual) corresponding to those parts 0 a | 
(55, significantly different on x a well as A variates but the adjusted F (F,.)is significant. This example 
Sum of squares (55) due to regression = oo" illustrates a situation which is opposite to the first numerical example. The data on this fictitious 
ss experiment are as under: 
(54° 2916 ,, 
a ee 45.75 Table 23.13. Scores on a covariate (X) and a criterion variate 
_ : (Y) for the single factor experiment 





Total SSof Y= 164 : ee 
Group A Group B Group C 








Residual = 164—45.75 = 118.25 cna Final ifiteeat 
nitial scores inal scores initial score ina ee ; 
Out of the total variation of Yscores of 164, 45.75 is due to those parts of Y scores which can ecores . Hine sie Initial scores Final scores 
be predicted from X scores and 118.25 corresponds to those parts of Y scores which are xy Y X; Y2 x, Y; 
independent of X scores. The test of significance of the regression of Yon X is being presented in an ne P 
Table 23.12 9 10 15 15 10 15 
Table 23.12. ANOVA testing the significance of regression of Yon X re ay) > 10 25 25 
Source of variation 5S df MS F - iia 10 a ” sacs 
Due to regression 45 75 eae 20 30 25 30 5 10 
Residual (error) 118.25 13 9.09 5.03 a a - 15 _20 5 10 
Total a - = um: 60 105 70 90 55 80 
Mean: 12 2] 14 14 11 16 
gio7® _ 503 (P 5 
apo, ES) =X =60+70+55=185 
F (df 1, 13) = 4.65 (0.05 level) 2X =105 +90+ 80 =275 
Fidf1,13)=9.97( | 
‘= 9.07 (0.01 level) The computational work for ANCOVA for the data presented in Table 23.13 may be started 


by “ranging the above data as in Table 23.14. 





» 































Methods #”" pebavioural Sciences Carrying Out Statistical Analyses ©29 
esedre’ * : 
ments and K ANCOVA 
628 Test Meas C tational Arrangemen . Table 23.15. Summary of ANOVA of X scores 
rapne 23.14 COM” Group B - es $5 df | 
3 ! Y, (X)(¥;) Xe aes source of varie 23.33 : * | : 
GroupA _ x: Ai X2 a toe a otween BrouP® ae k-1=3-1=2 1.6600 | | 
ee om) 295 2 ~ saith roups (error! 226 Nees 532 SLB 2s | 
1 ee 15 . 25 1s 643.33 1 
ee =< 100 15 a a 14 
50 5 10 : 5 100 _ 11.66 = F at 0.05 388 
5 WS 100 400 en fF, = =0.22 F(df2, 12) 
19 20 ~«=—-200 15 (625 i0 «=o 15 100 BS 51.66 F at 0.01 693 
375 750 62 
‘2 00 a , 105+90+4+80)° (275 7 
10 a aa 950 2425 70 90 1475 1200 1850 (i) Correction (Cy) = Tid = 45. = = = 5041.66 
Sum=60 10 14 18 
Means = 12 21 oc... (ji) Total sum of squares (SS, ) =ZY" -C, 
ae 2 y2 = (2425+ 18504 1450)-—504166 
! 3 
Kes ign gt 3 - = 5725 — 5041.66 = 68334 
0 | 
10 15 150 ae (jij) Between group sum of squares (SS ,,,) 
25 25 629 i _ ey eer, eur 
10 «6 20—St—«-—Ss—«‘i 400 = Se ec, 
2 oe n, ny ny 
5 10 50 25 100 2 2 2 
105) (90 6 
5 100205025 _—*100 Or = + COT 504166 


Sum: 55 80 1075 875 1450 
Mean: 11 16 


= 5105 -—504166 =6334 
EX =60+70+55 =185 (iv) Within group sum of squares (SS) =5S,, -SS 4, 
= + { j= 
= 683 34-6333 =62000 


EY =105+904+80=275 
Table 23.16. Summary of ANOVA of Y Scores 
Nee Ee 


Steps of computation Source SS df MS F 
A. ANOVA of X scores: Between groups 23.33 k-1=3-1=2 31.67 
2 ’ ithin groups (error) 620 N-k =15-3=12 51.67 0.61 
tii CarkeationtC,j = OP" a apae7 a - , 
arrection(C,) == a eae Total 643.33 14 
3167 F at 0.05 3.88 
ie eee Nes pen praia sf (df2, 12) 
(ii) Total sum of squares (SS,,.) = 51.67 F at 0.01 6593 


EX*-C, =(850+ 1200+ 875)—2281.67 
= 2925 -2281.67 =64333 


C. Analysis of covariance 


(i) Total sum of products (SF, ) 











(iii) Between group sum of squares (SS, ) = LX-LY 
oon bo =(ZXY, +EXY,+EX,Y;)- 
(EX EX | (EXsF ; 
n, = 185x275) 
‘ | : "3 -(1425 +1475 +1075)-! — 
(60) (7Or (55/7 
=— 7 liad ec = 
5 5 5  +281.67=2305 -228167 = 2333 ~ 3975 -3391.66 = 583.34 


(ii) Sum of products between groups (SP, ) 


(BX, -EY,) | (EX2-EYa) | (EXy-E¥s)_ BX-ZY 
a a ae N 


liv) Within group sum of squares (SS. )=SS. —§5 
“Ww tx xb 


= 643.33 2 - = 
3.33 = 620 nh, Ny ns 





i sciences 


Hol ret 


a f f 
; be pis! 


2733 
iat hs | 5 x80) (18> Se 2 
om“ gout, C2 15 
o— 5 
= 5 : 


= spt - SP, 
oducts within groups (SP, ) =5P A 


1. 4= 575 oe ee 
aa ducts (SP) may be arranged with sum of squares 
pro u t 


(ji) Sum of Pr 


nce, the total sum of 


convene ; 
For Sage under: 


both X and y varia 


um of products 
of sum of squares of Xand Yand that of su p 
mmary OF 
Table 23.17. 54 mima 











Total Within Between 
os 8.34 
575 
3.34 3 
sum of products (XY) weer 620 23.33 
Sum of squares (X) 620.07 | 63.33 
Sum of 5 ares (Y) 






liv) Adjusted SS for Y(SS,..«) 


| (SAY 
Tota! adjusted $5 =S3 47 ss, 
4 (38334 
iain 643.33 
— 683.40 — 528.94 = 154.46 
: (SP, 
Within adjusted 55 = 55 — < 
(575) 


= O07 ———— = 62007 — 533.26 = 8661 
aE b20 


Between adjusted $$ = Total — within 
= 154.46 - 86.81 = 67.65 


Table 23.18, Summary of covariance analysis 





Source of variation §5 df MS F 
Between groups 67.65 k-1=3-1=2 33.82 
Within groups (error) 86.81  kin—1)-1 , 7.89 4.29 
=3(5-1)-1T=11 
Total 154.46 13 
33.51 


<== =429 ~~ F(df 2,13)005 =380 
y 794 
Discussion 


a rsiie in the computation of Numerical Example |! are essentially the same as those 
INvolved In the computation of Numerical Example |. Therefore. th ‘ 
a L* at = ose | 
explained here. P F steps are not being 


ware tie I that F test applied to the initial scores (X)is not significant even at 0,05 
“. means that the three groups don’t differ significantly on the X test. Likewise, F test applied 


ee 


ie 
Vdf is lost because of regresson of ¥ on, 


> <a 


Carrying Gut Statistical Analyses 631 
anal scores (Y) ts also not significant at 0.05 level, Again, it is conclude 
ro the don't differ significantly on Y test also. However, | 
5 P HF la 4 mili . 
pron the sures apie by the initial X scores indicated a significant result. In other 
ad the adjusted Foot 4.24 i ‘ne to be significant at 0.05 level. Thus the three groups differ 
wore cantly when the groups have been adjusted for initial differences in X. Hence, the 
‘onl grisons aMONE means of the groups must be tested. 


af) that the three 
the analysis carried out on ¥ measure by 


com 


But before testing mean differences of the three groups, let us determine whether the 
gion of Yon X is significant. For this, the total sum of squares of Y measures are divided into 
rep mponents—one corresponding to those parts of Y scores which can be predicted from the 
gon d the other (residual) corresponding to those Parts of Y scores which are independent 


5 é 
ex The sum of squares that can be predicted from the X scores is obtained by the formula: — - 
ot X: 


x scores an 





, (583.34) 

f squares due to regression =~" 
Sum of sq 5 64333 
Total sum of squares of Y= 683.40 


Residual = 683.40 — 528.94 = 154.46 


= 528.94 


Table 23.19 ANOVA for the regression of Yon X 





— SS —=RS 
~ Source 8 dA MS 
Due to regression 528 4 ] 528.94 
44.52 
Residual 154.46 13 11.88 
ee 
Total 683.40 14 
526594 Fat0.05 467 
= = 4452 (df = 2,13) 
1188 FatOQOl 907 


Out of the total variation of Y scores of 683.40, 528,94 is due to those parts of Y scores, 
which can be predicted from X scores and 154.46 is due to those parts of Y scores which is, in 
fact, independent of X. The test of significance has been carried out in Table 23.19. The obtained 
F value is 44.52, which far exceeds the critical value of 9.07 for df1 and 13 at 0.01 level. ThusH, 
(null hypothesis) is rejected and it is concluded that the regression of Yon X is highly significant. 
This provides an evidence for the fact that X and Y scores are significantly associated. 


Now let us test the mean differences of the three groups by using ftest: 
The individual Y means can be adjusted for differences in X means by the following formula: 


My » > M, — bw IM, —Gajy) (23.24) 
where, 


M, . =Adjusted mean of Y when X is kept constant 
M, =Mean of uncorrected Y scores 
M, =Mean of X scores 


bw = Regression coefficient of within groups (SP, /SS,,. ) 
G,, =Grand mean of X scores 


Here, bw = SP. 975 


=—— =093 
SS, 620 








é _G, _ 185 _ 4933 
Mx N 15 


Now by applying formula 23.24, the adjusted Y means for each of three Broups can " 
ow. 
calculated as given below. 
For Group A: 
M, , =21-093 (12-1233) 
_ 94-093 (-33)=21+4 0306 = 21.306 = 2131 (M,) 
For Group B: 
M, , =18-093(14-1233) 
=18-155 =16.45 (M,) 
For Group C: 
M, , = 16-093 (1 1-12.33) 
= 16—093(-133) =16 + 124 =1724 (Mj) 


Now we proceed to test differences between the three means by computing ¢ test. The 
formula is: 
+ __mean difference 
SE, 
SE, of any two adjusted means can be computed by using the following formula: 
SE, = (mse + d 


1 Np 


(23.25) 
MS,, =Mean squares of within groups (adjusted) 
N, =Number of subjects in the first group compared 
N, = Number of subjects in the second group compared 
Here, df = kin—1)—1=3(5-1)-1=12-1=11 


where, k =number of groups to be compared 


n=number of subjects in each group being compared 


i | 7 
SE, = ,|789| +41) 3.156 =177 
iS: 5 





The computed value of tis: 


fy M=My 2131-1645 486 9 | 
SE, i? a7 (Significant at 0.05 level) df =11 


ii) Mo-M3 _1645-17.24 079 


“9 is = 0.45 (Not significant) df = 11 


(ii) M,-M, _ 2131-1724 407 — 
SEs code ioe rm 230 (Significant at 0,05 level) df =11 


The obtained results show that Gr 
f oup A and Group B d Feta ree 
and Group C differ significantly. owe p © altfer significantly. Likewise, Group A 


t, Group B does not differ significantly from Group C. 





Carrying Our Statistica! Analyses 
MPutation of F lest fro 
archer has @ 


in Table 23.20, an example of co 
jestgn) has. Deen provided, A: tose 

vm ioyees i a building construction company, 
“nproverent the number of homes sold is record 
ae researcher wants to answer the question: 
vith more work experience? 


633 


athined Gone Measures (within-group 
EG Sales Performance of 5 new 


To see if the ee. 
ed each toh’: Significant trend towards 


Or 3 months of | 
i TMG S of employment. 
here a significant change in sale ie ran 


Table 23.20, Computation of repeated measures ANOVA 





Person Month 1 Month 2 
A 1 Month 3 Total 
B 4 4 ; 8 
C 3 ' : 18 
D 2 4 ; N 
E 0 4 : 13 
X (or mean): =2 4 : 60 
Computation: 
, _ (60) 
(i) Correction Term (C) = ra = 240 
(ji) Total SS(Sum of squares) = (1° + 474374. 4.524.724 62)_¢ 
= 306 -240 = 66 
. 8 187 TP 132 197 
(iii) Between subject 5S = B ~ oe + ls + Ee 1 10"). 
5 Ss 9 8 3 
= 2593 -240 =193 
(iv) Within subject SS =Total SS — Between subject 55 
=66-193 = 467 
2 Had 2 
(v) Between treatment 55 = [4 + a ~ a -C 
ca & 3S 
= 280-240 =40 
(vi) Residual (Error) = Within subject SS - Between treatment SS 
= 46.7-40=6.7 
Table 23.21. Summary of ANOVA for single factor exepriment with repeated measures 
Source of variation SS df MS F 
Between SS 19.3 k-1=3-1=2: 
Within SS 46.7 N-k=15-3=12 
Treatment SS 40 n-1=5-1=4 10 11.90 
Residual (error) 6.7 (N-k)-(n-1) 0.84 
=12-4=8 


re EE 


Total 66 8 





fourdl sciences 


© a - ' nah 


634 Tests. Measurem 

Between treatment MS 7 10 _1190 
a Error M | Be 
Fat 0,05 = 3.84 


dé for numerator = 4 Fat OO1= 701 


df for denominator tation have been summarized in T 
| sults of the compule™” 6 able 
ubject SS an This indicates that the total sum : 


\| set in bold type. 1° 3 
under df) are 4 cae subjects and within subjects sum of squares, The 


. — ponent: fata , 
squares has only two compone Ficthet divided into treatment sum of squares and residual Sif 


; | we | . 

within oi cate al by dividing between treatment My by MS residual (error), The 
e | pe | | 

Snel is significant and therefore, Fg (null hypothesis) 's rejected. 

obdtdl 


easures ANOVA 

| test with Re eated measures ’ st fc 2 ANC 
ie Bae : exactly si significant difference exists, the researcher must follow the ANOVa 
oat ie tests. Like independent measures ANOVA, In repeated measures ANOVA Tukey's 
wi ! cm 


HSD can be computed as post hoc test. The formula for Tukey’s HSD in case of repeated 


measures AN OVA becomes: 


MS 
HSD = qe (23,26) 


where q is the value of the Studentized Range statistic, MS error is residual variance and n is the 
number of scores in each treatment. 

In this experiment MS urge 'S 0.84 with df =8. At 0.05 level of significance with the k =3, the 
value of q for these data with the help of Table 23.5 is 4.04. 


naa 
HSD = 404) = = 166 


t means that the mean difference between any two samples must be at least 1.66 to be 
significant. Using this value, we can make the following conclusions: 

Mean difference between 2 months and | month = 4 - 2 = 2 (significant) 

Mean difference between 3 months and 1 month = 6 - 2 = 4 (significant) 

Mean difference between 3 months and 2 months = 6 - 4 = 2 (significant) 

All these three mean differences are significant in light of Tukey's HSD test that determines 
the minimum difference between two treatment means that is necessary for being significant. 


4. Pearson r 


Of all the measures of correlation the Pearson r, named after Prof. Karl Pearson, is one of the most 
common methods of assessing the association between two variables under study, The Pearson 
correlation measures the degree and direction of linear relationship between two interval-level 
variables. Pearson rrepresents the extent to which the same individuals, events, etc., occupy the 
same felative position on two variables. It is a symmetric statistic that makes no distinction 
between independent variable and dependent variable. It is also known as Pearson 
marge as correlation and abbreviated tor. The size of Pearson r varies from +1 through 0 
ae ete oe oo coefficients have the limit of +1 and -1. A coefficient of +1 indicates 
coefficient of coe ep ie ; coeficient-of 1 indicates pertect negative correlation. The 
savtdledhons cncliichonk ats US two things, First, it indicates the magnitude of relationship. A 
size of correlation. The as a or -0.90 gives the same information about the magnitude of 
indicati air gn makes no variation in the size of the correlation. 5 cond, it gives 
indication regarding the direction of th | , ae Gh SIE CORRAL e NG 

e correlation coefficient. A positive correlation indicates 2 


Carryin 
-drrying Out Statistical Analyses 635 


ilar trend ol relationship between two variables, that is, as one | 
ncreases, 


i | j 
gmilor or as one decreases, the other also decreases, Consider the rol the other also 
‘ @ relatio 


eres 


nship between 


cores and c m ieve 
ce test $C lassroor achievement, Generally as intelligence test scor 
F 4 dafe 


-lassroom achievement is also raised. And, therefore. the 


ralse’ n these two variables 5 positive. Likewise, consider the eee oo ne orn 


bet "at. As fatigue increases, Output decreases. Here the relationshi 
oulp 4ses, the other decreases. Sometimes, 


elation between fatigue and 
p is negative because as one 


' the relationship j : igs 
me licient of correlation is likely to be zero, P iS not consistent. And in this situation 


the Pearson product-moment correlation has two important assumptions 
1. The relationship between X and Y variables should be linear. A linear relationship refers 


ptneten dency of the data, when plotted, to follow a straight line as closely as possible. Although 


ther Foote * _ # . ke * 
gene rally this is determined by the inspection of the scatter diagram or correlation table. 


o are some statistical tests through which one can test whether or not the relationship is linear 





(a) (b) (c) 


Fig. 23.7 (a) and (b) nonhomoscedastic (c) homescedastie and linear 


9. The second assumption is of homoscedasticity (homo means ‘like’ and scedasticity 
means scatteredness). Defined statistically, we can say that homoscedasticity refers to the fact 
that standard deviations (or variances) for columns and rows in the scatter diagram are equal or 
nearly equal. Figure 23.7 illustrates homoscedastic and non-homoscedastic distributions, 

in this figure we have three diagrams. In diagram (a) the variance of the distribution near the 
centre 15 smaller than the variances near hoth extremes, and hence, the distribution Is 
non-homoscedastic. In diagram (b) the variances near the bottom extreme are lower than the 
variances at middle or at top extreme, and, therefore, the distribution is non-homoscedastic. In 
diagram (c), the variances are equal throughout as well as linear. 

As we know, the Pearson correlation assesses the degree and direction of linear relation 
between two interval-level variables. Pearson correlation identified by letter ris computed by: 

spree to which X and ogeth | 
_ degree ich X and Y vary together (93.27) 


, = 
degree to which X and Y vary separately 


— covariability of XandY 
variability of X and Y separately 


In any distribution when there Is a perfect linear relationship, every change in X Is 
accompanied by a corresponding change n ¥. In such a case, the covariability perfectly reflects 
the total variability of Xand Y separately. The overall result is a perfect correlation of 1.00, On the 
other hand, when there is no linear relationship, a change in X variable does not produce any 
predictable change in Y. In this case, there will be no covariability and the resulting correlation Is 
zero. Pearson r can be calculated by several formulas. In this book, we shall illustrate the 
calculation of Pearson r by the raw score formula or machine formula. The equation Is: 


NEXY-EAZY a (23.28) 





OF 


————=$=—— 
en 


=, 
JINEX? -(EXPINEY" -Y/ 








shavioural Sciences 
d Research Methods in [se haviou 

- and Ke 

feusll roments a 


636 Tests N 


| cient: N= number of scores; X — 
od -moment correlation coefficient; N " X= scone: 


= Peas , ‘able = | 
wheres = Te sin Yvarian’e ministered intellige, 
X variable; andY =score 10 students who were ad elligence te 


, f lie ae st ( 
sents the scores °° calculated from the ten pairs of scores 
Table 23.22 pre has also been Tein ssecca' €S, Th. 
+ anxiety test (V7. Pearson tad with the help of a table (see table given in Appengi,” 
an f the obtained / Is Te less than the value required at even 9p 


feance 0 j ed value of ris ine 
on Heath 1970, 376). Te hypothesis «accepted and it Is concluded that the scorae 








Downie ‘ $ er 
ies of ignifcance. Hence, Te on the anxiety test are not correlated. The Sign makes 7 
an test and the 
intelligence is magnitude of the correlation. _ ee ne 
difference" arranged in bivariate distribution as isthe case lagram or i, 
When the oe sl should be calculated by the following formula: 
the correlation la e, 5x'y’ 
oC, 
: (o,.)(6,.) “9 
nati | 3 X test; y ‘=deviation of scores ; 

7 , p-y' =deviation of scores from mean on At | eS from 
aii ricer ae of frequencies; Cy = correction in X-seriés scores, and C, =correction : 
mean al . 
y-series scores. 

Table 23.22 Pearson rby raw-score method 
a ee ee 
10 B 100 64 80 
6 15 46 225 90) 

‘ 13 9 169 39 

8 16 64 256 128 

12 20 144 400 240 
13 13 169 169 169 
20 11 400 12] 220 
15 10 225 100 150 
| 10 12 10 0t—“‘<i«‘ 120 
EX =102 EY =138 ¥ NX? =1272 LEY? =2048 E XY =1336 
NEXY -EX LY 


i TEE eee 
yINZX? -(2X-2)[NEY?-(Y)) 


{(10)(1272 )-(102 )*}[(10)(2048)-(1 38))) 


a 26-716 = nag. 
3325776 1823671, 


df=N-2=10-2=8 


Carrying Out Statistical Analyses 637 


while interpreting correlation coefficient, the 
feren circumstances, wh 3 can Cause a higher or | 
pero | latively few perso sack 
ian one ia anki mi a ide have Pair of scores that are markedly different from the 
rest of the sample $ °¢b"**: vidual's pair of scores are called as outliers. In fact, the actual 


magni ude of a . oe ‘oe by one Of More outliers. The second circumstance Is 
one where all other 4 2 p eaee there is a homogeneity in group of scores. in such a 
) the magnitude of Pearson rwill be lower one. In simple words, if there is smaller range 


ait there will be smaller value of r, 
oi seer’ 


researchers . pointed vee many ways for interpreting a correlation coefficient, 
; ending upon hago or t ; researchers and the Circumstances that influence the 

on’s magnitude. One popular method uses a crude criterion for interpreting the 
de of a correlation. This is presented below, 


eee must take into account two 
Wer Correlation, One circumsta nce arises 













Value of r Interpretation 


0.80 to 1.00 High to very high 
0.60 to 0.80 Substantial 

0.40 to 0.60 Moderate 

0.20 to 0.40 Low 

0.00 to 0.20 Negligible 


Another method of interpreting cor relation coefficient is on the basis of statistical 
<jgnificance ol correlation based on the concept of sampling error. Still another way of 
interpreting correlation coefficient is in terms of variance. In fact, the variance of the scores that 
the researcher wants {0 predict is divided into two parts—one that is explained by predictor or 
treatment variable and the other that is explained by other factor (generally unknown) including 
sampling error. The researcher finds the percentage of explained variance by calculating r? 
popularly called as coefficient of determination. Then, percentage of variance not explained by 
the predictor variable is 1—r°. Let us take an example to illustrate this fact. Suppose the researcher 
wants to predict veneral academic achievement (Y variable) on the basis of IQ (X variable) and he 
obtains r of 0.65 between IQ and general academic achievement. He can use this correlation to 
andr? =(0.65)" = 0.42. This means that 42 per cent of variance in general academic achievement 
is predictable from the variance of [Q. It also means that 58 per cent of the variance of general 
academic achievement Is due to factors other than |Q such as home environment, schools 
environment, level of motivation, etc. 

still another way of interpreting Pearson r is in terms of standard error of estimate and 
coefficient of alienation denoted by letter K. As we know, in predicting X on the basis Yor Yon 
the basis of X. we also compute standard error of estimate for gauging the predictive strength. In 
fact, standard error of estimate is the standard deviation of the difference between the actual 
values of the predicted variable and those estimated from regression equation. Briefly, it is simply 
the standard deviation of the errors of estimate. The formula for standard error of estimate !s: 


SElest), . =5,J1- hy (Yon the basis of X) (23.30) 
SElest),., = 9, 1 = eh (X on the basis of Y) (23.31) 


aE: vers a 
Researchers have shown that it is possible to gauge from V1-r° the predictive strength of an 


1 i Peon ¥ . ' | 1 j 
. This ¥1—r? is called as coefficient of alienation and is symbolized by letter K. 


K=V1-" 


j a " — - a 
‘ -h Meé . oY A 


ts and 
rocte Measurement . 1 
ee hip between two variables just as T Measures 
ationship 


_ Presa 
imple words when K = 1, r= 0.00 and Ww Nee 0 


ence of Fel hen 


sriables. In 5 f K, the smaller the exte = 
SLi, hatween two varia r the value of K, NMoOfrela 90 
elation viously vl . Ceca from Xto Y and the vice versa, ON shiy 


ill | a . * - 
jaccurate WI rd as ‘enation provides us a me 
and less precise be said that coefficient of alienation p aSure 
| itcan DE > 
Ina nutshell, 


Of the 
| iction) are reduced, When , _ . &Xt 
‘+ our estimates (while making prediction hen - en, 
: in our es 
to which errors | 


=Z109 
. ake no errors in predict; rth 
a an becomes 0.00 and, in fact, we mare 1.00 and on edicting ong son 
coefticient of alienation 100 the coefficient of alienation is 1.00 and t erefore t te 
n f = . fF , 


K measures abs 


he St 
, ‘at i © Stand, 
ener oh omes the same as standard igi a. frequencies Whe. 
cant bec ‘ch ir § that our e Pat 

aiorof ier ient of alienation) is 0.87, which in oer rie OF estimate +: 
r =0.50, the K as 394 (100% - 87% = 13%) smaller than they would be if there Were wi 
(.87 as langeas-0F too Wie 
correlation between Xand ¥ variables: ; f correlation. 

There are three important applications of co 7 a 7 

- eines Wewo variables are correlated in some systematic way, it is POssible to ue 

i nearer for making accurate prediction about the other. If the correlat; 
one of the \ S 


on is perfe 
ct 
(+1.00), this prediction Is hundred per cent accurate. 


i ly, the researcher develops a new test and he w 

Mw nga ier . pales know if the test is truly measuring what it ¢| 
validate e in ot sa “ 78 of knowing about this validity is to correlate the new test wi 
her tos ne s measuring what the new test is measuring. For on a newly developed 
intelligence test may be correlated against another La test eveloped earlier, I the 
correlation is high, it is said that the newly developed intelligence test has sufficient degree of 
validity. 

mi Theory verification: Correlation is also used in the task of theory Veritication, Many 
psychological theories make specific predictions about the re lationship between (WO Variables, 
For example, social psychologist may develop a theory predicting personality type and prosocial 
behaviour. Likewise, physiological psychologist may predict a relationship between br 


ain size 
and learning ability. In each case, the prediction of the theory could be tested by calculating 
correlation between the two concerned variables. 


Ants to 
aIMS to 
th s0Me 


5. Partial correlation and Muliiple correlation 


When X and Y become correlated with each other, there 
correlation 


is always the possibility that this 
is due to the association between each of the two variables and a third variable. For 


example, among a group of schoolchildren of different ages, the researcher may find a high 
correlation between size of vocabulary and height. In fact, this correlation may not display a 
genuine correlation between size of vocabulary (X) and height (Y) but may also result from the 
fact that both vocabulary size and height may be related with a third variable that is, age (21. 
Likewise, if the researcher has found a high correlation between academic achievement and the 


time devoted to study, it may not be true because this may result from the fact that both variables 
may be related to a third variable. that is, intelligence, 


In designing any experiment, the 
experimental control for elimina 
control or eliminate the inf| 
relation between ability 
these variables are , 
between abj 


researcher has the alternative of either introducing 
ling the influence of third variable or using statistical methods to 
uence of the third variable. Suppose the experimenter wishes to study 
ted notize (X) and ability to solve certain kinds of problems (Y }. Both 
‘se elated to intelligence (Z). Therefore, for determining genuine correlation 
[ememonzeand ability to solve problems, it is essential that the third factor, that 
a controlled. lf the experimenter wants to control intelligence through 
experimental control i : trae choose Subjects with equal intelligence. But somehow, | the 
IS Not feasible then statistical control can be applied. 


two Vatialy les Statistical method that allows the researcher to have 4 
‘ables by controlling the impact of the third variable. 


a 


Ca Out § 
yt Cup 6 latistical Analyses 


. 639 
in simple words, partial correlation Coefficient 


erval-level variables, controlling for ONe of mm 
" ealled partialing out or adjusting for 
ie ordinary correlation coefficient 
Like a between X and Y is foun 
rder partial correlation but whe 
| correlation. 


IS 4 Measure 
Ore other vari 


oF holding a var 
iby a oe anges from +] 00 to —] 
Oontroll; cpiaiter 
Nid awe Only one other variable, it is called as 
S are Controlled, it jc Called as second ord 
-order 


Of linear relat; 
‘ Honship betweer 
ables, Cons P Detween 


rolling a variable is 
first-O 
pa tia 


elation between variables X and Yc \ 
oe regressions of X on Zand Yon Z. tn simple words, one can think op soe ne fesiduals 
0 wing as independent variable for predic 
se 


aa ting X and \ in two 
ocess, two sets of residuals are computed, One set of 
plained by Z and the other set of residuals represe 
: | 


artial correlation coefficient between Xand ¥c 
P relating these two sets of residuals, 
CC - 


Separate 
residuals repr 
nt the variation j 
ontrolling 


esents the variation in X not 


n Ynot explained by Z. The 
for Z can be easily computed by 


The formulas for computing partials correlation are aS under, 
First-order partial correlation: 


- Ly Tha by 
eyez = 


———__ 


(-n)-r2,) (23.33) 





Second-order partial correlation: 


; ye az Waz 


a ee ee Ter 


(nz) =r) (23.34) 


(Computational equations for higher-order partial correlation 
discussed here. The order of 


controlled.) 


coefficients will not be 


partial correlation coefficient indicates the number of variables 


where, X=Independent variable; 
¥ = Dependent variable; 
Z = Control variable; 
A= Control variable. 


Let us take an example to illustrate the computation of first-order partial correlation. 


lal Zero-order correlation between ability to memorize (X) and ability to solve certain 
problems (Y)=r,, =0.80 


(b) Zero-order correlation between X and intelligence (Z) =r,, =005 
ic) Zero-order correlation between Y and Z =r,, =0.60 


0.80 —(0.50)(060) 
(i - 0.50?) (1 - 060") 
080-030 _ 0.50 


2 a —— 


2 ea END 072 


\(075)(064) 069 





Goa = 








aa 


oy 
J = i . © 1 j 
R Siduals mean the Si; and lir ‘tion of error in prediction, 


pecan rel Helene 


h etbood jay Uf 


ith equal intelligence, then Correlation | 





we : Id be 0.72 rather Why 
jyoc ls woul he ) in problems wor | than 0.6q °° 
ws (hal ij all su y tity 10 solve cena hore control variable (7 variable) j« ) 
Laie rige 4 (dab qyation yr arise nde “xmay be ( ormrelaled with the a Plea 
ol sil ‘Maeda A and Tens | “FCSidual 
neti" : 1! y ' ‘ ri i te é alo Jia 
oweve a able only, aay" oe correlation “ semip shines cortelation, The my, 
emaved from of lation Is &* tit f yand with Zremoved, is given by: Dian 
rhs type u he x vl the poaldusals onere 
ewer 
i il pil ation I Fwy . Palys (24 
Faly il e /' r? i Hf) 
ye 


part correlation i! 
75, then (he part COF 


ae wQHOand yy, m™ 
ir, OT OY 475 (OBO) , DY 0283 
Iriya JI: (72 0.54 


P where measurements lor SCOres) }y: 
) have bee, 


all enetally arise | 
orate and subsequent MeasUrEMeNts On th tn 


aimental reaiment 
rital trealment 


Application ol part ¢ 

4 ‘i 
ablained prior lo givitip, mY Ps oe 
subjects oF partie pants uncer ux 


iple correlation | | err | 
aa elation coefficient denoted by letter f, in its simplest form, 15 understood al 
Multiple carrenten 


’ ven one variable anda combinal ion of at least lwo other variables, The Mulin 
lat jonship betw en va v¢ defined ay a measure of linear relationship between a ependey 
correlation coef bined effet is of two or more independent variables. In simple words, p 
variable teeta 6 pei gcores actually earned on the criterion variable and SCORE 
alte ve te ( terion variable from se ana pipleoant multiple rERTESsion 
equation, Making predictions in this situallon Is called as mu ee SStON, | 

When we want to know the R between Y (dependent variable) rie the cor nbined effects of 

x, X, (two independent variables), the formula for multiple correlation coefficient becomes 
like this, 

Fe ' fs (2h ahi hy) Loh 

Las Ir, (23,36) 


Let us lake an example, 

Variable 1: Academic achievement 

Variable 2: Study habit 

Variable 4: Intelligence 

Zero order correlation between these three variables are as under: 
,, = 0,50 
hy = 060 
ry, = 0.40 








x. 507060! -2 (050) (060) (0.40) 
\ 1--(0.40) 
= 10.4405 =066 
Multiple correlation coefficient Ris inte 


measure of relation between dep oe in the same way as correlation coefficient F tiga 
3 i ela bes ; 
independent variables, Just as ¢’ pendent variable and the combined effect of the set of 


as r* is af 7 ; 1, ' 
4 proportion, so also R° is a proportion. When K Is squared, 


past hoc fallacy. 


Carrying Out Statistical Analyses G41 

» a ie called coefficient of tabs 

thal | a ‘a the dependent Pe eth determinatior , Which indicates the proportion of 

pole vara je above example Ri bob m . explained jointly by two or 9 nti phe abs ss 

Ys ahles. In the above nple, R15 .66 and therefore R! Bieshaesa iore independent 
yin at ademic ac hievement has been explained by ni , 4.\1 means that 44% of 

“ging 5% has not been explained, Note that R? js ae tudy habit and intelligence and 


6 t i 1 P 1 ‘ 
re" “en dent variable and independent variable, Maller than any single r’ between 


depe' 


correlation and Causation 


Ee lal ist ahead Is lo assume that a correlation nev ily ipl 
fie ‘| atv | ahve 4 = Ve “ahion necessarily Imphes 
4 { yseand = i | : eee Deane ies variables, Our lives are constantly omar ati 

Oe oa tarionship between lwo Variables, A fe 2 constant ad will 
pon of relation r | , v rables, A jew examples are: Cigaretie smoking is related to 
is disease: ale ohiol ¢ onsumption is felated in birth defacts- hard | , IS FETA Ut 
il r : ie 


abour during examination b 
a tf | fae" Ce c ae : 
gudents 15 lated to good grade; Carrot consumption is related to good eyesight. Do bess 


eationships indicate ¢ ause-and-elfect relationship between the two concemed variables? Does 
‘ag ratte emoking Causes lungs disease fof t Arrol f aUSes good eyesight? The answer Ts no 
ips | rf . . a . | . i = = : P 

What, in fact, is being said is that correltional studies simply do not allow the inference of 
causation. Correlation is a necessary bul not a sufficient condition to establish a causal 


‘lationship between two variables. Faulty causal inferences from correlation data is called as 


NONPARAMETRIC STATISTICS 


the important nonparametric statistics which have been included in this book are as follows: 
1. Chi-square (X“)test 
9. Mann-Whitney U test 
» Rank-difference methods (both rho and tau) 
4. Kendall's partial rank order « orrelation 
5 Coefficient of « oncordance (WW) 
6, Median test 
7 Kruskal-Wallis H test 
8, Friedman test 


Each of these techniques has been discussed below, 
Chi-square (X°) Test 


The chi-square 1s one of the most important nonparametric statistics, which is used for several 
purposes. This test was originally developed by Karl Pearson and therefore also sometimes called 
as Pearson chi-square. Due to its smooth uses for various purposes, Guilford (1956) has called it 
the general-purpose statistic. It is a nonparametric statistic because it involves no assumption 
regarding the normalcy of distribution or homogeneity of the variances. The chi-square test 1s 
used when the data are expressed in terms of frequencies or proportions or percentages. The 
basic ideas behind use of X“ are that (a) we have a theory of any kind concerning how our cases 
should be distributed, (b) we have a sample which shows how the cases actually are distributed, 
and {c) we want to know whether or not the differences between theoretical frequencies and 
observed frequencies are of such size that these differences might reasonably be attributed to the 
chance factor, The chi-square applies only to discrete data. However, any continuous data can be 
reduced to the categories in such a way that they can be treated as discrete data and then, the 
application of chi-square is justified. The formula tor calculating X’ is given in the next page. 





ot ‘ ay ay) Cri 
ff a4 t yal journal / rl) . , 


jos “arch M. 

i asurements and es | | 
G42 Fests. se : ; ‘ 
x= L i 23.3 

fo 

f? 

-= 5 = = N 
| ey . 
of, 


btained Or observed frequency; and f. = expected frequen 
. a f. = odlall eee 
where, x? be chi-square, to f observations. 
theoretical frequency’ id 
: ian ol AW 
The calculatio it saves Lime. 
formula 23.37. Hence, © °* test 
Tvl “hie re Test. 
— ses of the chi-squa ee eee eee 
There are several us ne used as a test of equal probability hypothesis. By equal 
First, chi-square apr robability of having the Irequencies In all the given ¢ 
hypothesis, we cone 1 00 students answer an item in an attitude scale. The 
al. Suppose, 1Or Exell = 
wee ap response options—sirong nid f fre a Bree, 
categories of response OF ‘lity hypothesis, the expected frequency of responses given 
ding to the equal probability Ny| ah . hether c h ¥ 100 
According 90 in each. The chi-square test would test whether or not the equal probability 
students vena Sabie chi-square test is significant, the equal probability hypothesis 
becomes tenable. "i". ae of the chi-square is not significant, the e wrokski: 
becomes untenable and if the value of the chi-square |: g : qual Probability 
nypothesis becomes tenable. _ . a . 
me ‘the chi-square test is in testing the significance of the independe 
The second use of the chi-square te tacts iv eependence 
hypothesis, By independence hypothesis is meant that one variable is not affected by, or related 
to, another variable and hence, these two varibales are independent. The chi-square js not a 
measure of the degree of relationship in such a situation. It merely provides an estimate of some 
factors other than chance (or sampling error), which account for the apparent relationship, 
Generally, in dealing with data related to independence hypothesis, they are first arranged in a 
contingency table. When observations on two variables are classified in a two-way table, data 
are called the contingency data and the table is known as the contingency table. Independence 
in a contingency table exists only when each tally exhibits a different event or individual. 
The third important use of chi-square is in testing a hypothesis regarding the normal shape of 


a frequency distribution. When chi-square is used in this connection, it is commonly referred to 
as a test of goodness-of-fit. 


- total sum 0 less arithmatic than calculat 
ires less ar te ‘UlallOn of x2 
ith formula 23.38 requ NGEN wit 


Probabjj 
ACB Ores a 


ly agree, agree, neutral, disagree and strongly dis, 


The fourth use of chi-square is in testing the significance of several 
testing the significance of the phi-coefficient, coefficient 
contingency, we convert the respective \ 
appears to be a significant one, w 

As an illustration, let us tak 
200 students who were classifi 
lsee Table 23,23), 


statistics. For example, for 
ol concordance and coefficient of 
alues into chi-square values. If the chi-square value 
é also take their original values as significant. 

é an example of a3 x3 contingency table, which shows data of 
‘ ed into three classes on the basis of their educational qualification 
Their educational attainment is measured in course of study by classifying them 


Table 23.23 The use of Chi-square in a3 
Superior . 





«3 contingency table 






Master 30 (25) my Inferior 
Bachelor ee 5 (15) 5 (10) 50 
Intermediate 7 (13) 15 (10) 50 
100 


i eee 


Carrying Out Matistical Analyses 43 
rior, average and interior, Now, the QUestion 


-orial qualification? The obtained data have b 
1 the data given in parentheses because they in 
omit 


IS: Is educational 


achievement relat 
CEN shown in Table oo 


23.23, For the moment, 
ney, The first step in calculating x? as a pe penis irequency and not observed 
es ionship between educational achievement prulicance of independence or the 
rere" | 


and educatio 


nal qualif 
hat these 4 


«pected frequency. The null hypothesis is 1 two variabl 


ication is to compute the 
dependent, and if this hypothesis is true, the 
in 


expected frequencies should ee a — 
Cells of table 
Expected frequency 
TT ieeié ii. ii i. ———— eee 
Upper left (100 x 50¥200 =25 
Upper middle (60 «50/200 = 15 
Upper right (40 x 50/200 = 10 
Middle left (100 x 50¥200 = 25 
Middle middle (60 x 50/200 =15 
Middle right (40 x 50/200 = 10 
Lower left 


(100 x100/200 = 50 
Lower middle (60 x 100/200 = 30 


Lower right (40 *100/200 = 20 





eS a 


After calculating expected frequency for each cell, the chi-square may 
shown below: 


x =200 


be calculated as 


f. f fc E (f, - £7 (f,-f¥ 

I 

30 25 +5 25 1 

15 15 0 0 0 

5 10 5 25 25 

25 25 0 0 0 

10 15 5 25 1.67 

15 10 +5 25 2.9 

45 50 5 25 0.5 

35 30 +5 25 0.83 


df =(r —1)(K -1)=(3 -1)8 -1)=2x2=4 




















yr nS SR a 1.2 
iy Methods 1" Rebar yanal Actences 
Ar geseare! Chae 
64 Tests seasterements 4° he ss mathe votives est chi-scuay mess 
at chi-square, WE fa + {1D « eb in TTY Out Statisticy! _ 
ing the robability table o a wwe chi-square Is below IP > 0.05), we conc *4y Entering the probability table of Chi-square ' ane 8S 
sales int be 9.488. aan he two variables, namely, educational WUalificgy. thay 9.001 level should be 10.827. As the obtained id that for df = Wthe value of chi 
aye] § “p : . “aly aM. ‘ DU Walliges cyf , sha ata = 
he 0.05 oothesis s retained: ei e idy are found to be independent. For calculatia hy clude that item nos. 6 and 10 are no independers. sy chi-square js fiich above i 
pathest> © cent S c . cane “PENCE V 
re in ~ attainment 1 me aor g(r — 1k — 1) wherer =the number of Towe Ma - sgmetimes it happens that with 1 df any one his ats, they are related, eae 
-atio | noted above, § and; 30 Sst ax ineaien _ _ oMY¥ One of the expec 
St rest the formula, di and i than 5. In such a situation, a Correction called Yates’ 7 pected cell frequencies becomes less 
Bilis: . } ‘ ; , : V, - : ane Oecd; meee ae : 
aber of columns. -ecs of computing A” becomes much SIMpPler ac; vters have suggested that Yates’ correction for cony departed Continuity is applied. Some 
the nun la 23.38 the process 0 saaial had U as j = wrl + Feacquven nies: @Ge8- eelewr 10:1u8 Continuity should he apoli plied. Some 
By applying formula ‘ a Thus we find X* ina simp er way the data of 23.23 as yt expected lreq hare Fae ns . Where frequencies ard Vere a led when any of the 
many ofthe arithmetical ca cu > met co? G5. (SF. (5? (20/2 Nd. difference but wh 7 quencies are small, Yates! Cttection + BE a Correction makes no 
sq? (25). (45) (15) ‘ (VO) ey + 4+ — 299 -onsists in reducing the absolute value of difference betwee Ss Significant, Yates’ correction 
y= — Se 66 +7. 15 30 10 10 20 ¢ which is larger than f, is decreased by 0.5 and each f which C ~~ i. by 0.5, that is, each 
95 «25 a. Oo o- rmula for chi-square in such a situaties «...”. 's smatier than f, is | 
es 5 + 6.67 + 4083 + 9.5422.54+20)-—200 0.5. The fo q a viet 'S aS Biven below. e 15 Increased by 
, | , NI 
= 209-200 =9 | N| IAD ~6c\- 
iv the same value of chi-square as it was obtained with the help of oe cr | 
Thus, we get exactly {ne sel Ormuly (A+ BC+ DVA+ CBs Dy (23,40) 
23.37. : , a | ipts are defi 
Chi-square in2 x2 table: When data have been inh nged ina 2 x2 contingency table (Whey where SUNSETS ASC ENKIOR Ss ust 
df = ). we need not calculate expected frequency In the manner described above. In sich Suppose, 60 RICE pa boys and 10 girls) were administered an attitude scal | 
situation the chi-square can be directly calculated with the help of the following equation: were to be answered in “Yes” and “No” form. Their iicninantiee — e. bigs es 
: Oe 2 resented in Table 23.25. The question is: D a | wards item no. 10 are 
: —_ MIAD-BC | presen | a 0 the opinions of boys and girls differ significantly? 
(A+ BC + D\A+CiB+ D) (23.39) Table 23.25 Chi-square with Yates’ correction in a2 x? table 
where 4, B,C and D = symbols for frequency of four cells in a2 x2 table; N = total number ¢ Yes No 
frequencies; bars (II) indicate that in subtracting BC from AD, the sign is ignored. Boys ‘ 
Suppose the researcher wants to know whether or not the two given items in the test ae “0 
independent. Both items have been answered in “Yes” or “No” form. The test was administered tg 
a sample of 400 students and the obtained data were as follows: ir| 
Girls 10 
Table 23.24 Chi-square in a2 x2 table 
ltem No. 6 23 37 60 
Yes 


According to Equation 23.40, 





No 72 
300 !20x7)-Govay- 
ltem No, 10 2 L 2 | 
(50)(1.0)(23)(37) 
ie 100 
: _60[1140-90)-30]* 60x400 24000 | 
425500 425500 425500 — 
., ——_ i ( In the above example, the expected frequency (23x 10/60) is less than 5. Hence, chi-square 
According to the formy la: 


has been calculated by Equation 23.40. Entering the table for chi-square, we find that for af = 1 
F Value of chi-square at the 0.05 level should be 3.841. Since the obtained value is less than it 
P >> 0.05), we conclude that the opinions of boys and girls do not differ significantly. 
x? — 40011(180(10)—(1.20)(90)112 Ee gga 
(300)(100)(270)(130) 
= me 1000000 _ 32400000000 (a) Random sampling: For the proper use of chi-square test, it is assumed that the sample 
1053000009 ~ 1053000000 = 30769 under Study is selected randomly from the population (Gravetter & Wallnau 1985). 
| (b) Independence of observations: By independence ot observations, it is meant that each 
observed lrequency is generated by a different subject or a person: This should not be confusec 


_ There are some assumptions of chi-square. Some important assumptions and restrictions for 
USING chi es | 
'8 chi-square test are presented below. 


WA Nk~1)= 0-194) 2, 





ences 
: val Aa fa F 
parton 
gyn ie 


wrethow's 


Sy . 
) pesca” in the tes 
wd REM ind in the test of Independence 


ants at » TOL 
jaro t a5 dS ; 
able bject could pr oduce response th 
x: ‘ = j ‘ : 
ot indepenoe”: sy and jnapprok e to more than one ft requency cq al 
‘ | i 7 ” 


. ronce [pa 7 tribul wi ifthe unt t 
with the con “t id be misle weRory hould not be used if the scores dre based 0 
chi-square © {in mor 


than one care e tests s Coups 2006). On 
wid, © Aron «& 
nofe category: in hale more than ee » of the expected frequency (f,) tends to 
he same people beings 3 -6ii : e pal tion of chi-square fora single cell, Su 
ie red te -ompute 


ic) size of expec 4 us cons {this cell to the total chi-square will be; 
alue e chisee . The contribution ore 2 

the * f —jand fy, =° ; - «2 6-1% _ GY 95 

cell has the |, if am Ps Bk 


cell=~ FT 


PP Ose , 


¢ =15and f, = 10. The difference between f, ands ; 
e where f, 4| chi-square value differs from that of the first case 


her jnstane 


ae er . 
Now consid af this ce 


<til| 5 but the contribution 


4 ‘i 2 . 
4 -£¥ 5-10)! _ GY 22-25 
ao enn 190-10 
‘e 


oe ove.a great impact upon chi-square value. This problem 
f, value c sean eell fact, the test is too sensitive when f., values are 
es are les hould not be used when any of the expected cel 


it is clear that a small | 

hen f, valu 

omes serious when I, | at 
mons small, Thus, the chi-square test 
frequency is less than 5. | 

id) Assumption of contin 

assumed to be a continued vari 

degree of freedom. When chi-squ 


uity; For the theoretical distributions, ee values are 
ble. This assumption is clearly violated when there is only one 
are is calculated for all — ae given size, the 
oA eee ntinuous. In such a situation, statistic; 
resulting ee = prnona nig aienalty which consists of subtracting 0.5 ‘i 
ie i the difference between f, and f, aia & Wallnau ui): 7 
Associated with this assumption of continuity, there is controversy regarding minimum 
expected frequency. research article by Lewis and Burke (1949) has demonstrated nine 
common errors in the use of chi-square. One error is related to the expected frequencies that are 
too low. Most statisticians are of view that every cell of contingency table should have a 
reasonable size of expected frequency. Some of them recommended a minimum of 10, with 5 as 
bottom limit. Fisher (1938) recommended 10 as minimum. Sull others recommended that the 
minimum should be some proportion or percentage of the total or that it depended on whether 
the expected frequency were equal or not. A rule that has generally been adopted in the | degree 
of freedom situation is that the expected frequency should be equal to or greater than 5. When 
df > 1, the expected frequency should be equal to or greater than 5 in at least 80% of the cells. 
When these requirements are not met, other statistical tests are available (Siegel & Castellan 


1988). However, a major and significant review of the research on the topic was done by 
Delucchi (1983). He drew two major conclusions as under: 


a) i SHAE teat May be properly used in cases where the expected frequencies are 
wish vial the value considered permissible. The most important principle seems 
there should be at least five times as many individuals as there are cells. For 


example, el) wi 
seth ee it a very low expected frequency would be acceptable in 2 xd 
ingency table if there are at least 20 subjects in the study overall. 


In Case the researc 
Se arc re 2, PET | 
expected frequency — as a table larger than 2 x2 with a cell having an extremely low 
common step would be e ue number of subjects or participants are also small, one 
combine related categories to increase expected frequency 


> we 


Corre ape 
iN Y rae lee site al Anutly ot ie | hi? 


ynd reduce the total number of cells. But thy i 
‘ oe 


: Psephuyt ok 


: study 
(b) Even though LISTE & hi square WIIh small ex pocler] Preseyuse rie 

i} bal ee oe : QUENCES May he a . te te 
etill not. a wiser step. This is because the probability of getting baited doicon Mest 
the research hypothesis is true, may be quite slim. In other 


frequenc'es: the power of chi-square test js 
becomes high. 


“ Wanihcant result, even if 
tM. In other wore » WITN small expected 
Yer ' 4 an . BIS: 

¥ levy, Thus the risk at Type I) error 


Chi-square test has some limitations, Loo. Important limitations are as under: 


(a) AS said earlier, ol ni test can not be used when the researcher has tec 
: : | esearche s counted or 
included a . : hase than once. This error produces what is called as an inflated 
N and is extremely serious and may easily lead to the rejection of the null hype th 
when, in fact, it is true (Type | error). = eee 


ib) Another limitation of chi-square stems from the fact that the 


| : value of chi-square is 
proportional to the sample size. Let us take an example. In a2 «21 


; “chispaes able the following data 
were obtained with respect to relationship between sex and the level of anxiety : 


Sex 


Mal e Female 









High| 
Level of anxiety 


Low 


S=55 65 
X? =236(f, appears in bracket in each cell) 
When the observed frequencies of each cell is doubled, that is, 64, 46, 56 and 74, X* 
would be equal to 5.44 rather than 2.36. Despite the fact that the relationship has not 


changed, the researcher would reject rather than accept null hypothesis. Due to this 


reason, many researchers preter to avoid chi-square test when dealing with large 
samples because the results can be very misleading. 


120 


— 


(c) Still another limitation arises from a generally adopted rule related to the situation with 
small N’s or when the expected frequency/proportion/percentage among cell is small. 
The common rule is that with 1 degree of freedom situation, the expected frequency in 
all cells should be equal to or greater than 5. When df > 1, the expected frequency 
should be equal or greater than 5 in at least 80% of cells. Somehow when these 


requirements are not met, other statistical tests are available (Siegel & Castellan 1988) 
Mann-Whitney U Test 


The Mann-Whitney U Test is a nonparametric substitute for the parametric f test. This test was 
jointly published by H B Mann and D R Whitney in 1947. Needless to say, the Mann-Whitney U 
test is used when the researcher is interested in testing the significance of difference between two 
independently drawn samples or groups or two separate and uncorrelated groups. For 
application of the U/ test it is essential that the data have been obtained on ordinal measurement, 
sai ri) must have been obtained in terms Ol rank. Where the data have been obtained in 
a 23 scores for application of the Mann-Whitney L! test, it is essential that those scores be 

Werted into rank without much loss of information. It is not necessary for the application of the 


M ~\Vfhi | | ) 
tan Whitney U test that both groups must have unequal size. This test can also be applied to 
5'oups having equal size. 











afreat woural sciences 


phos int Bi 


agree Ade 

ae syeasurements and Rese _ shat if two groups are really samples f 

eas ” Mann-Whitney 1 of ranks in two groups would be roy hI 

The rationale eon the ratio of - “4 the two groups. onal malts Of ranks 

lation, r of cas | sis and Tl Cludes 
same ratte the ratio of the A er rejects the null hypothe that the 
ot so propor differ. and N, are made up of 20, 

niet s under study differ |, that 15, when both \ r athod d OF less 
two group ves are very small, TM i. (1956) book where methods and probabh 

pig sa cqucleris are referred to SIEBET 

the 


1 ilj 

The deal with ethods of Ca le = 

uch a les are given. jr CVS book We shall : : 
y to such 5 m h 

tables that appl | p 

the Mann-Whitney Li test, Whey r 


oncerned with larger sample size, say, i a 20 cases 
ich arec = ie r e@ are more otten c 
cee methods is that '1n research, W onfronted 
‘or selecting Only * 

The reason for 


tions ati the Mann-— hi 
; g greater than 20. The equalions for calcul ating Whit 
i hich ar 
with sample SIZES Wi h 


test are as BIVeN below. 


OM the 


U=N,N2+— ape Ri (23.41) 
NaN, + 1) 
WEN Nt >» R; (23.42) 


_whitney U test, which can be done by either 
Let us illustrate the ey 4 7 he 23 26 eens the scores oft two groups on the Lie 
Equation 23.41 or by mp ieee A eoun | has 21 subjects. The first step Is to rank all the scores 
Scale. Group | > Talbiion an increasing order of size. In Table 23.26 the lowest score 
in one a Sa together) is 7 (second column) and hence, we give Ita rank of 1. The 
(taking bot , sets ie h is again in the second column and it has been givena rank ot 2. The third 
ipa 8 rite is adn de first column), which has been given a rank of 3. In this way, ranking 
is neues until all scores receive ranks. Subsequently, the two columns a a are summed. 
At this juncture, a check on arithmetical calculation is imposed. The check is that the sums of 

these two columns must be equal to NIN + 1/2. - 

| (31)32) 
Check: 2. R, +>. K;= “es = 88.5+ 407.5 = 496; —>s — 496 


Hence, we can proceed: 


WIN, na, WOHIO + 1) 
iby Equation 23.41) U = NN; + SETS R,=(10(21)+ (10 } a —~§8.5 =1765 


[by Equation 23.42) L! = N\N,+ NAN, + 1) 





: -¥ R,=(10)(21)+ “VIF 1) _ 497.5 = 33.5 
Table 23.26 Calculation of the Mann-Whitney U test from larger sample sizes 
Gr. 7 R R. 
(N, =10) (N, =21) 
NS SSS eee ss 
18 32 ? 13 
Ms 40 5 18 
30 
10 : 2 i 
39 
: 3 16.5 
te 15 
| 16.5 6 
26 3 : 
27 : é 
“t 10 


> <—s 


Carrying Out Statistical Analyses 649 


19 i 8 14 
35 a bi 22 
"1 bid 4 20 
z 1 
a0 21 
Sl 27 
a8 24 
53 23 
29 25 
60 26 
65 30 
63 28 
67 31 
64 29 
a ER, =88.5 ER, = 407.5 


It is the lower value of U test that we want. For testing the significance of the obtained U, its 
jue is converted into z score as shown below. 
va 


el 2 NUN3 
_ eS (23.43) 
IN MNIMN, + N2 +0) 
| 12 
(10}(21} 
7 Caan ih _ AS. 


ee = = 302 
(1O\2N00+21+1) 23.664 


VY 12 


A zscore from +196 to +2.58 is taken to be significant at the 0.05 level of significance and if 
the z score is greater than even +2.58, we take it to be significant at the 0.01 level. Since the 
obtained z is 3.02, we can take the value of the Mann—Whitney U to be a significant one. 


Rejecting the null hypothesis, it is concluded that the two groups differ significantly on the 
measures of the Lie Scale. 


According to Equation 23.42, the obtained value of the Mann-Whitney U differs but the 


value of the z score would remain unaffected. However, the sign of the z score would be 
changed. 








(10)(21) 
33.5 ——_ 
_——— | 9 a Phe --302 
VUO)2N10+21+0 23.664 


12 


Thus we obtain the same z score but the sign is negative. The change of sign makes no citterence 
in the interpretation of the Mann-Whitney U test. 


‘ ; ie da 
- wh sretbouls in Rebarioural Ache 
pand Resear’ 
veaswremen 
650 Tee 
snk differences are very popular AMonp beh 
hods which are based upon the differences ; "Uy 


Rank-Diftere™ relation based upon ! 
ink-difference Method and the "nk, 
ther | 


{co - 

pie go . xl common . | 

ase seni a | yorgrrrnar Me 
psi dei Fond? variables. One 's the Ag 

ass! nM on As . : nah : : 

the vcaatal rank-difference hi nce method symbolized by p (read 19: 'ls . 

Ber rie sn coefficient between two sets of ranks or between Mle: Pula, 


The Spearma -orrelation cc it 
ao the correlat 7 -oncerned with whether 
had of computing t ~ When a researcher is not COnCETIS OF NOt a relay, 
scores converted into phen <a better choice. The method has been named after Spearman” i 
linear, — sitet ‘ applicable when the number of pairs of scores or ranks jg Dreferant 
discovered it. TMs . 
small, that ts, 30 oF below. 
The equation t- , 6). Dp? 
p= I end (23.44) 


where p = Spearman's rank-difference correlation coefficient; D = difference between ranks 
rank .. ail N = number of pall ot ranks or scores. . | 
To illustrate the calculation of p let us consider the data given in Table 23.27, which show the 


> 12 students on the intelligence test (x) as well as on the educational test ¢ 

sce oe 23.44, the ch step is to rank both sets of scores separately ine 
highest score a rank of 1, the next highest score a rank of 2, and so on. Then, keeping the 
algebraic signs in view, the difference between two sets of ranks is computed. This is noted under 
column D. Subsequently, each difference is squared and noted under column D2. Substituti 

the values in the equation, we get ap of —0.185. Following Siegel & Castellan (1988, Table 0, 
p 360) we can test the significance of the obtained p. Since the obtained value of pis less than the 
value given at the 0.05 level (p > 0.05) for N= 12, we can accept the null hypothesis and can 
conclude that X and Y are independent and whatever correlation has been found is due to the 


chance factor. 
Table 23.27 Illustration of the Spearman rank-difference correlation 





he y Rank, Rank, D D? 
(R, = R,) 
47 68 8.5 | +7.5 56.25 
50 60 5.5 2.5 +3 9.00 
70 54 2 7 ~5 25.00 
72 53 1 8 ee 49,00 
46 60 10 25 ~75 56.25 
50 55 5.5 6 -0.5 0.25 
42 48 11 9 re 4.00 
38 30 3 1? ai 81.00 
* 45 4 10 mn 36.00 
36 | 
43 12 1 4 1.00 
49 59 7 00 
Fi 4 re 3 q, 
47 56 85 25 
. 5 +3.5 Es 
rD=00 2D’ =33900 


Carry ny f 
ANIM Matisteal Anatyses 651 


om A449) a 2044 
V2(144 4)” pag = 1-185 = Oia. 


relation between Spearman Correlation and Pearson Correlay; 
TAM] 


ewe ompute Pearson correlation and Spearman Correlation fer 
she (WO correlations seer (one teatel SINCE these tis tel a same set of data, how will 
di (erent things, they | be producing differen, values, Ac vi heb NS are Measuring 
measures linear relationship whereas Spearman Correlation measures a Pearson correlation 
In most Cases, the value obtained by Spearman ceriekaite ~? MOROLONIC relationship. 
+},00 oF 1.00) than the value obtained by Pearson conshiiontsemien” 
because it is easier to be pertect by Spearman’s criteria than by Peinicn’ Re Gata. This happens 
‘ttuation where Pearson correlation will be producing la ' aL rowever, there 
correlation. When there is extreme score in the data, this single | i allgre wie sayhii 
influence on the value of Pearson correlation. The single extreme score tends to exapeerate one 
magnitude of the correlation bringing it nearer to. 1.00 of -1.00 than would be ex cet 
other scores points. The exaggerated influence of extreme score is eliminated with § Spearman 
correlation because the ranking process reduces the distance between adjacent scores to exact! 
one (that is, first to second, second to third, third to fourth, and so on). This fact is being cra 


below (N =05). 


ger (that is, closer to 


x y 
0 2 Pearson r = 0.996 
1 0 Spearman rho =0.50 
2 3 
3 ] 
15 5 
E=21 2=21 


It is obvious from the above example that the single score point (15-15) has inflated the 
value of Pearson r (0.96) making it much larger than Spearman rho (0.50) and bringing it Closer to 
+1.00. 
Another method of computing the rank-difference correlation has been developed by 
Kendall. The method is known as Kendall's tau for which the formula is as follows: 

3 (23.45) 
(1/2 IMN-1) 
number of objects or scores which have 


r 


where T = Kendall’s tau, S = actual total; and N= 
been ranked. 


Table 23.28 Scores of 12 students on Xand Y test 
F G H | J K a 


A B C D E 
| 19 «628 «= 30s‘ 


x 06 6 Tf tw wo 8 BAS 
. =- 76 ria 47 36 35 


Y ) go 40 45 38 4 






sr scores are presented in 

Suppose 12 students have been administered aes ches score a rank of 1, the 

Table 23.28. The first step is to rank both sets of scores seh ranks based upon two sets of the 
Next higher a rank of 2, and so on. Table 23.29 presents 


nails PERL Pe he PRR st 
Research Methods 1" f 
- aoasurements atte o | - | 
652 feb Meast ks of the X test are rearra Nei 
28, SUDS 


order like 1 Dy dann 


ks based upon two sets of sc 
mks 


in Table 23. 


a 
¥ tha 
ina natural 


scores piven 


they appeal 


ores given in Table 23 95 
»9 Ra 
Table 23.29 





23.30 presents the ranks jn 
are adjusted. Table . gd de 
Accordingly, ranks on the ¥ wrer corvitted For this, we start with the rank on thay TaN Be 


= | | lest fra. 
— ante left side is 11. Count the number of ranks Which are aby Ito 
the left side. The fr: 


} f + | | 11 x 


Table 23.30 Rearranged order of ranks 


E F G H K 


Only one rank (that is, 12) falling at the right of the first rank on Y test js above 1] and the 

remaining 10 ranks fall below it. Hence, its contribution tas wou be equal te 1-10, Likewise 

the second rank on the Y test is 7. The four ranks falling right of 7, are above 7 and 6 ranks 4 

below it. Hence, its contribution to $ would be 4 — 6. Identical procedures are repeated for othe, 

ranks on the Y test. Thus: 

$ =(1—10)+ (4-6) + (9 —0)+ (7 - 1) + (4-3) + (6 -0)+ (4 -1)4 (4 ~() 
+(2—1)+(2 -0)+(1~9) 
=(—9) + (-2) + (9) + (6) + (1) + (6) + (3) + (4) + (1) + (2) + (1) = 33 - 74 = 29 


Substituting in the form of formula: 





Ps 3 ay 955 
N(N-1) -12(12—1) 6 
2 2 


The significance of tau is tested by converting it into a z score, the formula for which is as 
given below, 


— 





= ——— 2 ' 
BON?) (23,46) 
V 9MN-1) 
| 033 033 0.33 | 
Hence, << SS eee = 
f2lavi2)+ 5) ~ Jo4ge ~ a2209 = '4938 
\ 9(12)(12-1) 


Since the obtained z scor 


€ is less than 1,96 
0.05 level, Accepting the nul| hypothesis, we ¢ 


According to Siegel (1 956, 214), tau has one a 





, We can say that this is not significant even at the 
an say that the given set of scores is not correlated. 
dvantage over rho, and that is that the former can 


not be the « ton. If both tau and rho are Computed from the same data, the 
€the same and hence, numerically, they are not equal. 


generalized to partial correlat 
answer will 





Se 


Caryn i) el, 

. dall’s Partial Rank-order Correlation M Chat MEH Ste eg Analysen 

e al at ay hehe 
henevel correlation exists between WO Variabl, 
ere . f Leas ei 
» to the association between CACHh Of thee there js Me 

< due basi hakes Nese twey y. “h poss 
15 ng group of sc oolchildren of diverse a + Variah| _ 
wen rather it may result from the 
welt 


ah 4Ct that borh bo MING Corre : 
-iable, that is, age. Likewise jf One finds a _ height and eight are Seige ae and 
an | ‘lated with a third 


ization ability among a proyp ey.) |B “Oftelation be, 
orization a BTOUD Of Child,  DEWeER <i, 
mero the fact that both these variables sp 1"°? the satne ABC, thiscor ne! Vocabulary and 
etol c ‘ : Gs, that 1S, Size af a= WG Correlation 
ar ciated with a third variable, that is, iain 0 Vocabulary me a May Not be true 
pease ; Mtelligence. MEMOrization ability may 
in designing an experiment, the 


xperimental control to eliminate or contro} the im nas the alternative of either 


alte «fl, NE IMpact of th; 3 
nethods to eliminate the influence Of thi : . third Variable by USINg some statistical 
4 chops eee the same intelligence level an heen Control, the researcher 
es Wary ana MEMOrizZation ahi; Orrelate th bead ; 
fo | neta may be appli " : eat eexperimenin Nattables, that is, 
Hi ed where the effect of varcenan 2 eis Sick cOnttol. Partial comely easible, 
method where the effect o Yaration in a third Variable upon th elt aon is one such 
variables, namely, X and Y are eliminated, Thys © relation between other two 
example, the researcher can compute th 


Y Using partial Correlation ;, 
Viegas! Cap € relation betw ‘a! Corre ation in the above 
vocabulary by holding the influence of intel} stay emorization ability and the size of 
Kendall’s partial rank-order is one non 


BENce constant. 
: : Parametric statist; 
correlates the two variables measured on ord = San where the researcher 
, ; 1 in inal | 
holding the influence of the third variable constan 0 ordinal measurement by 


K 655 


dle. ane OTheclty that the Correlation 
wes + ONE tight fi "td Variable. For examp| 
9Y NOt reflect 4 4. et find a ple, 


BY Correlation between 


Introd ucing 


inal scale 
tl. H 


pulation distribution of scores need to be made. When x sf abi wie shape of 
controlling the Z variable, it is worthwhile that the Kendall's tau should be ealculaioe recall 
and Y, X and Zand Yand Z. The formula for calculating Kendall’ Partial rank-order co Se 
coefficient is as under. ) ¥ correlation 
T - Ty (TT, ) 


ate ye 
As 


— ===>} 
V-T) 1-72) (23.47) 


where, T,, , =Kendall’s partial rank order correlation 
Ty =Kendall’s tau between X and Y 
T,, =Kendall’s tau between Y and Z 
t. = Kendall's tau between X and Z 
Let us take an example. Suppose the size of vocabulary (X), memorization ability (Y) and 
intelligence (Z) are three variables. The researcher wants to compute Ty 2 that is, he wants to 


have partial rank-order correlation between X and Y by controlling the impact of Z Suppose 
further that the values of Kendall’s tau obtained from a distribution (N = 2?) are as under: 


i =0.55 
Tz =0.46 (Kendall’s rank-order correlation) 
Ey =0,63 


Now, substituting the values in formula 23.47, we get: 


_ 0.55 ~(0,46) (0.63) 


a ce a SET RE a Set 
™ J-0.427)(1-063") 


= pat = 037 
0.70 





the “Orrelatig, 


eeding 20), the significance of T,, » May be tested by cal 
With larger N (exe I, 


of zas under. 


Culating the Valy 
4 
— 
iTygiNN-¥ : 
z= [2 (2N+5) 3.48) 
(3)(37) (22 (22-0) 
Thus, = J2(2%22 +5) 


_ 2385 _ 54) 
990 


wreeds 1.96 but falls short of 2.58, i may be conclude 
Since obained value O29 evel ed that, 
is significant at 0.05 level but not at U-- ‘l, 


a 


ien yncordance W 

seein ae symbolized by the letter W has been developed by 
4 measure of correlation between more than two sets oF ae. Thus, W is a Measure og 
correlation always among more than two sels of rankings sé events, ol ye is and individuals 
When the investigator is interested in knowing the inter-test reliability, Wis ¢ hosen as the mog 
appropriate statistic, One characteristic ot Woowhic ‘3 snail i trom other methods oi 
correlation is that it is either zero or positive. It cannot be negative. W can be Computed with the 
help of the formula given below, 


Kendall and js 


ff 5 
W=— 


12 
where W = coetlicient of concordance: 5 = sum Ol SQUAreS of deviations trom the Mean oR : 


I 
K =number of judges or sets of rankings; and N = number of objects or individuals which haye 
been ranked. 


a (23.49) 
K*(N* —N) 


Suppose four teachers (A, B, C and D) ranked 8 students on the basis of pertormance shown 


in the classroom, The ranks given by the four teachers are presented in Table 23.31. The details ol 
the calculations have also been shown. 


Table 23.3. Ranks given by four teachers to eight students on the basis of 






classroom performance 
ee 4 ji) sii) iv) vi wil 
A 3 4 7 ; 8 6 2 
i 2 3 t 4 8 7 9 
C 3 5 6 8 7 : ; 


Camying Our 5 " 
-p _9+144+20+20431 | Analyses 655 
stean of R, = +26+641! 


8 1d4 
~g 718 
$ =(9-18)"+(14~18F 499_102 
0-18} + (29-18) 
~18)" + (31-19) 
+-( 


: 26-18) + (6-18? + (1818) 
=(- yy" i : ai d 
— IY + l= AF + (27 + (3) *(13¥ +? + (1274 7 ” 
= 482 
Now substituting in the form of Equation 22.17 
w = _482__ 482 


_— 


i. = 
34 (gig) 072 





When N > 7, the significance of W is tested by Converting its value j ! 
the inllawing equation: € into X 


with the help of 


X* = K(N-1)w 


(23.50) 
Thus X° = 418 -1)(072) = 20.16 and df in this Situation is alwa 


di =8 -1=7. Entering in the probability table for chi-square, we find 


for df =7 at 0.05 level of significance should be 18.475. Since 
chi-square exceeds this required value, We Can I 


ys equal to N—1, Hence 
that the value of chi-square 
the obtained value of the 
| | ake this value of W asa significant one. Thus, 
rejecting the null hypothesis we can say that there is an overall Significant relationship in ranking 
done by the four teachers 


Median Test 


The median test |s used to see if fwo groups (not necessarily of the same size) come from the same 
population or irom populations having the same median. A median test may be readily extended 
lo more than Two samples, that is for several independent groups (Kurtz & Mayo 1980). In the 
median test, the mull hypothesis is that there is no difference between the two sets of scores 
because (hey have been taken from the same population. If the null hypothesis is true, half of the 
scores in both groups should lie above the median and the remaining half of the scores should lie 
below the median. Table 23.32 presents the scores of two groups of students in an arithmetic test. 


The first step in computation of a median test is to compute a common median tor both 
distributions taken together, 







Table 23.32 Scores of 30 students in an arithmetic test 


Gr. A 
(NM = 16) 





16, 17, 8, 12, 14, 9, 7, 5, 20, 22, 4, 26, 27,5, 10, 19 


28, 30, 33, 40, 45, 47, 40, 38, 42, 50, 20, 18, 18,19 


For computing the common median, both the distributions are pulled together as shown in 
the next page. 











rements and Researce Meee” 
ida - 7 


646 Tests, Meas 





ae 
49-53 
2 i> 
44-48 Median = /+ oe 
39-43 3 30/ 
ns 2~—13 
1 =18.5 4-7 )5 
34-38 ee 
=20.5 5 
29-33 “ 
24-28 3 
19-23 5 
14-18 3 
9-13 3 
4-8 3 
N=30 


Subsequently, a2 x 2 contingency table is set as follows. 

Now, the chi-square test can be applied. For computing chi-square from a 2 y 2 tab) 
may follow Equation 23.39. Yates’ correction is not needed here because none of he 
contains an expected frequency less than 5. ne Cells 





Above Not above 
Mdn Mdn 

Gr. A 16 
Gr. B 14 
30 

Now substituting the values in Equation 23.39, we get: 

x2 _ 3011(3)(4)—(13)(10)1]? 
(16)(14)(13)(17) 
~ 301112-1301]? 417720 
——— = 8.438 =8.44 


49504 ss ggsaa 


df =(r —1)(c -)=(2-1(2-1) =] 


y table for chi-square we find that for df = |, the chi-square value # 


-635. Since th ; 
the he obtained value of the chi-square exceeds this valué 
hat the two samples have not beef 


Entering the probabilit 
the 0.01 level should be 6 





oO 


re 


al-Wallis H Test 
ry difference between the 
test on the other (to be oc as 
ynalysis of variance, whereas the 
variance: The H test 1s 2 ONe-way non, 
way nonparametric analysis = 


CHE Stariens 
\ Ustical Ay 
alyses G57 


Krusk 


The prime 
fried man 


® one hand and the 
Sa Parametric 
netric analysis af 

Friedman lest isa 
he 


(wo- 





m (16) 30 (23) 
29 (22) 
7 27 (21) 
Ri = 38 K; =76 
R, =186 


investigator is interested in knowing whether or not groups 
drawn vin - same Population. if the obtained data do not fulfil the two basic p; 

5: pal Si ll namely, assumption of normality and assumption of homo eneity of fags 
H test is the most appropriate statistic, The equation for the H test is as Ee Mie the 


of independent samples have been 


12 R: 
ie a Sa i (23.51) 

ronle. number in al| samples combined: K; = sum of ranks in j sample; and N; = number in 
sna ete snaueds esclesuaerear nee a 

é | ‘If scores are presented in Table 23.33. The first 
step is to combine all the scores from all of the groups and rank them with the lowest score 
receiving a rank of 1 and the largest score by rank N. Ties are treated in the usual fashion in 
ranking. subsequently, the sum of ranks in each group or column is found and the H test 
determines where these sums of ranks are so disparate that the three groups cannot be regarded as 
being drawn from the same population. The ranks assigned to each score earned by the member of 
the group are given In brackets. Now, substituting the value in Equation 23.51, we have 


2 7 \2 2 
| yy O8F Goh ORF) aoe 





= 
— 





(24)(25) | 6 8 
— 23067192 _ 55 _ 99.445 -75 = 13.445 = 13.44 
600 











1 } tal 77 4 a te pw 
| i tlds 
i i il 


ane? ae 
and Rese 


yeasumemen? 


: ast is interpreted as -;: 
usts cases, the H test's in S Chi-s 
658 TS or more than 5” _m minus one. So, here df = bein In 
2 : 7 : % : 
hen each sample ee ee al group or samp" the value ol chi-square at the 0.9 si the 
sl ation, af = number Fad that for a! = | of H test exceeds this required a 
-h a situation,“ re W ‘ned value | rate ay 
such a 5! c s obtainet sthes 7 Ne: 
‘nw table tor , ince the ot eee | hypot esis, We Concly att 
acai ghould be 9.219- nificant oe meee the same population Me tha the 
significance - H value 5 4 heen drawn 
i ; the aw have noth ‘eae aes Bao "ayn bax § 
can be said ee nde and they hav les is from one to five, the H test can be i (rprey 
samples are Inde} bt of cases in anu Siegel (1956) and Downie & Heath 11979) ee 
enum ae a found In : 
ea a tables, which can be 10 
through spec! 


: ametric analysis of yarian 
Friedman Test ‘odman test is a two-way TT aati ae, zh ANCE. When, 
as mentioned above, the oT oes exist about the two basi is se: =m . i Namely 
the groups are matched as and the assumption of le ti ‘ “ie ad = mth 

‘an of normallt\ a rnot the samples have fom the san. 
the assumption © ting whether or | a ici me 
: st for testing i rb n test have been ; 
resorts to the Friedman test 10 the calculation of the Friedma Presented jp 
population. Data to illustrate Lausting Friedman test Is as follows: 
ee squation tor caicu 
Table 23.34. The eq 


y?= _ 2 ip p-3NK+) (23.59) 
TNKIK +1) 


_ F | 5 and R —— he 
at fa nks of each colul nei. 4 


Table 23.34 Friedman test of two-way ANOVA from scores of three groups on a recall teg 
fable 23.34 Fr | 


) fl TT IV V 
pg 12 (4) 8 (2) 4(1) 10 (3) 16 (5) 
Gr. B 10 (3) 7 (2) 6 (1) 11 (4) 17 (5) 
Gr.C 7 (2) 8 (3) 4 (1) — 1265) 10 (4) 
R, y 7 


3 12 14 
EE EEE EEO EEO EOE 
The first step in calculation of the Friedman test is to rank each score in each row separately, 
giving the lowest score in each row a rank of 1 and the next lowest score in each row a rank of 2, 
and so on. The ranking can also be done in reversed order, that is, siving the 
row a rank of 1, the next highest score in each row a rank of 


each score in each row is given in parentheses, The Frie 
whether or not the rank totals s 


in Equation 23.52, we have 


highest score in each 
2, and so on. The rank assigned to 


dman test is applied to determine 
ymbolized by R. differs significantly. Now, substituting the values 


q 


A, = — | 97 +7 4 OP 42a) ave. a 
ae P+(7F +3412) +(14) | 33)(5 +1) 
= 63866 - 54-9487 

| When the number of rows (N) and the 
of the Friedman test can b 
when K=4,N=2to 40 
done through these spe 
those said above, the F 
the significance o 
equal to k 


number of colum 
e ascertained with the he 


ns (K) are too small, the significance 
fr when K =3 N= 


Ip of special tables (Siegel 1956). For example, 
a “to 9, the significance of the Friedman test can be 
ae lables. But when the number of rows and columns are greater than 
man test is interpreted as the chi t | 

seas the chi-square test. In the present example, 
‘the Friedman test would be inter ; c ; 
— ior chi-square ap 


Ned ac reted in terms of chi-square. The df is always 
PHEC as a test Of significance of the Friedman test. Hence af in the 


> i 


COmying Out Statistical Analyses 659 
le would be K — 1=5 -1= 4, Enterin 
m 


Bthetable ford 4 we find that the chi < 
. a ee nae ' al the chi-square 
és ent nes at the 0.05 level ot a cee Since the Obtained value of the becker, test 
Fou are yalue (p <0.05), we reject the null hypothesis and conclude that the three matched 
e400 er significantly. 
rouP? 


oR 


vai methods are the most commonly used statistical 
correlation discussed some important methods of correlat 
lrea 

pd 


techniques in the testing field. We 
a di frarence method and Kendall's tau, 
rank 


ion, namely, Pearson r. Spearman 
know, a correlation coefficient is a mathematical 
As We 


‘tude of a relationship. There is also a related techn 
an 


index that describes the direction 
et Francis Galton 
od by Sir 


ique called regression, a term first 
in 1885, and today it is used to make a prediction about scores on one 
sn the basis of known score on another Variable or possibly several other variables 
variable 0 & Takane 1989). In fact, these predictions are done from the regression line, which is 
ferguson the best-fitting straight line through a set of points in a scatter diagram. It is estimated 
def te ihe principle of least square which, in fact, mi 
b sill 


nimizes the squared deviations around the 
ion line. Let us explain through an example. 
repressl 


discussed above, the mean is the point of least squares for any single variable. In other 
is the sum of squared deviations around the mean will be 
wot 5) : 


less than it is around any value 
than mean. For example, the mean of the five scores, namely, 2, 3, 4, 5 and 6 ts 
oT en = 4, The squared deviation of each score around the mean can now be easily 
LAVIN = 


ined. For score 6, the squared deviation is {6 - 4)" = 4; for score 5, the squared deviation is 
term eee 
4y =1, The score ¢ 


4 is equal to the mean and therefore, the squared deviation around mean 
‘Il be (4-4) =0. Thus, by definition the mean will always be the point of least squares. The 
will be 


scion line is the line of least squares or running mean in two dimensions or in the space 
repre ae 
pee by two variables. 


As narrated above, a regression line is the best-fitting straight line through a set of several 
points in a scatter diagram, which is the picture of the relationship between two variables. The 
regression line is described as a mathematical index called regression equation. > 
equation is an equation for predicting the most probable value of one variable from known value 
of another. The general linear regression equation for the straight line is: 
Y=bx+a 

where, ais the intercept, that is, value of Ywhen Xis an 
zero. In other words, the point at which the 44 Vea 
regression line crosses the Y-axis (a) is found by using 10+ 

the following formula: 


(23.53 


a= '¥ —Dx 


where, b is the slope of regression line or it is 
called the regression coefficient. It is expressed 
as the ratio of sum of squares for the covariance to 


the sum of squares for X. In other words, 
bh — Vertical change 


: ———.. In Figure 23.8, two regression 
horizontal change 


lines have been drawn based on equation Y = a+ bx. 


The regression equation gives a predicted value 
lor Y, on the basis of X. This predicted value is called 
Y", The variable being predicted is called the 
“Merion variable and the variable from which it is 





412 34 5 6 7 & 3 
Fig. 23.8 


——SS 


= 2 = ' h ‘al. fa’ ic TS 
gel Be i i Ager er i i f f 
i cestil us errs i Fig i Fi | a | ae fi L, m ! y, 


» predictor variable | 
edict variable, In eqUatlons, bie ei atin arrapie ' Usual 
ig called as Pree sable as) Thus * predicts ). 1) . shu ormula, y ist 
: ‘ ge “Te ar- uy ‘h 4 ays n qi : ’ i 
criterion Vere eon variable and X 15, AE pee eee ae the Predicy ; 
eon rhe onben ore on yous rarely exactly the Sal “ir 


; sredicted 
ae as \ and the 


r Te acd a ee +g wd bas : ve, 
son's predicts tual score and predict hatis,¥ —Y occurs and this is tee] Some 
anable. In fact, the ac dand predicted score, at ts: PP a se ECMnicall 
vari the observed arn p ‘s detined as y¥ -Y¥". The best-fj 


difference betwee Te 
called the residual. >Y" 
actually keeps the resid = 
observed and predicted Yscore: 91 sel 
to zero if averages the best-1itt ne es 
‘can be said that the best-titting [Ir © 


as possible 


% residual thus, x . i shoe a ‘ Wing 1 7 
an in other words, 1 MIniMzes the deviation bee 
x . “Ty 


residuals can be either posit ive oF Hepa and will Canice| 
is rightly estimated by squaring each residual. In this wai 
pest obtained by keeping these squared residuals as small 


abolicalls ; 
qual to minwn 
Since 


veers eclionte <. . 
garence between correlation and regression? In fact, there IS a differen, 
What is the ditrerence © nce is that the correlation coefficient has a reciprocal nature 


, ithe dittere nt 
eonoeemperse sori ari ie property. The correlation between and Y will always be the sa 
pope a eaion between yand X. For example, if the correlation between test anxiety arid tas 
as tne co 


‘or ~e is .64, the correlation between test performance and a cel aa is also .64, We 
PEON ee Gmilar thing about regression. in fact, regression Is USeE LO: ENSIONN ONE Varigne 
sake a on : oA the other \ ariable. In other words, regression Is often used to predict ray 
a 4 on ihe basis of raw scores on % Of vice versa. For example, through regression 
nen’ the investigator might be interested in predicting peal one ee. test on the 
basis of score On, Say, the anxiety test. The eoelricien’ GEnOnNs rb ine é se he usually not 
the same as the coefficient denoting the regression ol Yon X. Since regression uses the raw scores 
of the variables, the trait of reciprocity does not hold true. 


Making predictions on the basis of correlational research is igen “i as pia analysis, 
Regression analysis tends to show how changes in one variable are relate se : anges in anothe; 
variable. In psychological testing, regression is often used to determine w ether changes in test 
scores are related to changes in performance. For example, do persons who score higher on 
mechanical reasoning test perform better as 4 mechanical engineer? In fact, regression analysis 
and related correlational methods show the extent to which these variables are linearly related, 
Besides, they also often create an equation that estimates scores on the criterion (such as 
performance as mechanical engineer) on the basis of scores on a predictor (such as scores on 
mechanical reasoning test. 

Why is regression also called prediction’ As we know, regression literally means going back 
or returning. The term regression isc used here because the predicted scores on the criterion 
variable is closer to the mean of the criterion variable as compared to the distance from the value 
of the predictor variable to the mean of the predictor variable. However, this rule does not apply 
when there is perfect correlation between the criterion variable (Y ) and the predictor variable(X] 
Thus we can think in terms of predicted value of the criterion variable regressing or going back 
toward the mean of the criterion variable. 


MAJOR TERMS AND ISSUES IN CORRELATION AND REGRESSION 


When we go through the report ot correlational analysis, we often need to know about the 
meaning of some important terms and related issues. Such important terms are: residual, 
coefficient of determination, coefticient of alienation, standard error of estimate, shrinkage, Crass 
validation, correlation-causation problem and third variable explanation. A brief discussion 
these terms follows. 

1. Residual 

Ashort form of residual variance i 

of variance (ANOVA) ber es ‘i used in Stat istics, especially in regression analysis and analysis 

accounted for by the inde wri to that part of the variance in the dependent variable that is nol 

pendent variables and thus, attributed to chance variation. In regression 


equation, the differen | 
ce between a score and its expected value according to a theoretical 


is called as resid 
: ual, AS We Viel! i val ed ch cd value oft 


' , -¥', Let’: » 
pol | is defined asY . Let's take an example 


F 
Her viFig F yay 


oa j E wy fin ae i 
rn adedlition to thase predicted value: wall Attila Gly 


j X. | icted value and 
yedicled value anc the observer 


Pa Where: 


Are ey 
Pol foe 
Values fed y 


ali = ¥ 
LL ay ty The Citereny 
Fei : # 
| Ntu.al Atwl Sy Tiber silly 
PUPS al ry is 
tee | tj re i 


wre nAE AS 


Cappers 


cted score of 8.46. The residual will he: = Perse 


and ; ine = HAG = 154 ae 


segion analysis, residual has some mbp 
in peprese roby ves PrOpE Tl ie 


(a) The sum of residuals is always equal to zers (9) Ty 
re ‘ "| i Lie | r ral Le py ‘i iV’ | 
ib) according to the prin iple of least SUdres, the ; 


smallest value. In other words, ZIY —Y¥") =omalle << iaitlaaas 


“puater) resuchya|: 
=| Value Hel VS the 


c efficient of determination (r _ 
0 : -_ pe , 

ue of r* is called as coefficient of determination because it 

. i |e 1 | re) re 4 = Tere (he oo 

| bility in one variable that IS determined hy OF associated w i BUTES [Ne orcportion of 

“igble- For example, a correlation of r=090 (090) means that wi vc relationship of other 
ee ‘< determined ge ee ae = (ODO) = OF) tor 81%) of 
variability ater qerernines OY “ is olsen variability of X is deterrnined by Y 1, pes EL OF 
further suppose the investigator finds a correlation of r =075 between IO) x 7 
amployees i, means the coefficient of determination will be (o75)' =| = eo ret 

a ; 5 i ‘ , =a ifdiv ating Lae | te 

56% of the enhancement tn salary is due to the variation in IQ of the onal cating the fact that 

ie ™ Ae OYVPesS. 


Of the 


Let us take another concrete example to illustrate the concept of : 
jeterminati on. Suppose the investigator has been given the task of sidbcting ra ernie rahe se 
of Mr R ajesh. To make the task of prediction, he is provided with three bits of sidan — 

(a) Mr Rajesh wears 4 of size-9 shoe. | 

(b) Mr Rajesh has been working for 3 months. 

(c) Mr Rajesh has a monthly income of Rs 6,000. 

it should be clear that the first bit of information (shoe size) is of no help because there ts no 
correlation between shoe size and annual income. However, the second bit of information 
(length ot employment) does provide same useful information. Since Mr Rajesh has been in 
employment for the last 3 months only, he ts probably in an entry-level job and ts eaming a 
relatively low salary. Although this information may improve the accuracy of the prediction, the 
amount of accuracy would depend on the strength ot correlation between salary and seniority. 
Suppose the investigator gets a correlation of r =0.38 between salary and seniority. A correlation 
of 0.38 would mean r2 =(0.38) =0.14 or 14% variability in salary is determined by seniority. 
Finally, the third bit of ‘nformation, that is, monthly income is highly useful because there 1s a 
perfect correlation (r = 1.00) between the monthly income and annual income. Since there 1s a 
perfect correlation between monthly income and annual income, the investigator ts now better 
able to predict with periect accuracy. Here the coefficient of determination 1s r° =(l00y = 100 
which means that annual income is completely determined by monthly income. 


3. Coefficient of alienation (K) 
The coefficient of alienation is the measure ot nonassociation between [wo variables. Thus we 
can think of K as a measure of absence of relationship between * and Yin the same sense that r 
measures the presence of relationship. Therefore, when K = 100, F = 000 and when . = eee 
r= 100. With the larger coefficient of alienation there is smaller extent ol relationship an 
therefore, less precise is the forecast from Xto Yand vice versa. 
K is calculated as Vier where r- is the coefficient of determination. In the i of 

_— 3 ee a 014 = W086 = 0927. 

semiority and annual income, the coefficient of alienation is v1 -030° = Ji- O14 


is ve | iori annual income. 
This means that there is high degree of nonassociation between seniority and annua 


—— 


i Te aL. 
662 Tests. Measwremen’s cand AEM 
4. Standard error of estimate - 
: : F r oo, , re 
| af estimate ts a meas 
rd error of est! 8: 
; en standard error of estimal 


+ accurate wh ate 
eet ee less and less accurale. 
itself, allows us to make predicting. 
by itself, al ake predictions bur jy 


adiction becomes 
predic ; 
» a recression equalior ; t ee ne 
As we know, 2 res “og nrec re ry 
bout the accuracy Of predictions. Therefore, for Meas 


‘ + i a 

; ar information ; : + = = | omy _ = : 
provide he regression, it is customary to compute a standard error oj estimate 
precision ¢ | 4 deviation of the differences between actual values of 1.’ 
defined as the standard deviation ¢ ; siiok OF the 
ariable and those estimated from the regression equation. More briefly, 
Vv se ex 


deviation of the residuals. | : ; 
When Yis predicted from X, the formula for standard error of estimate becomes. 


ne = J, \! = Fe 


accuracy of prediction, The task of , 

e 1s relatively small. As i becomes «littign 

* late, ‘ 
i 


Oe 
meal It 

ig re : 
ICIS Simply the sant 
at 


The stand 





(23.54 





When X is predicted from Y, the formula for the standard error of estimate becomes: 


=S 4 — Fs 
Say = Sey 2355 


5. Shrinkage 
Sometimes a regression equation is developed on one group of subjects and then used to Pred; 
Itt 


the performance of another group. Shrinkage refers to the amount of decrease observed wh 
regression equation created for one population is applied to another one. Formulas are availa, 
to estimate the amount of shrinkage (Jaccard & Wan 1995; McNemar 1969), Let us ia le 
example. Suppose a regression equation is developed to predict the performance of one arb an 
MBA students in the first two semesters on the basis of MAT scores. Although the Proporti Pot 
variance in performance in the first two semesters might be fairly high, we can expect to ae a 
for a smaller proportion of variance, when this equation is used to predict the performance a 
these two semesters of another group of MBA students selected on the basis of CAT scores Thi 


decrease in proportion of variance accounted for is known as shrinkage. 


6. Cross validation 

In psychological and educational researches, the term cross validation is frequently used, As 
explained in chapter 6 also, in psychometrics, it means evaluating the validity of a test b 
administering the test to an independent sample drawn from the same population on witch 
was originally validated. In regression analysis, it means measurement of predictor variables and 
comparison of predicted and observed values of the criterion variable on a fresh sample drawn 
irom the same population as the sample from which regression equation was originally derived 
for determining the stability of predictions. For doing so, a standard error of estimate can be 
found for the relationship between the values predicted by the equation and the values actually 


observed. 


7. Correlation-causation problem 


Just because two variables are correlated, does it imply that one has caused the other? We are 
hice re reports of relationships: Cigarette smoking is related to lungs cance, 
lings cancer sty ie “ ated to birth defects, Do these relationships mean that cigarette causes 
be a iceiueal Falatinn ch pan Sete causes birth defects? The answer is no. Although there may 
pie ences ee memakciies e simple existence of correlation does not prove it, Thus mere 
designed to establish ¥ coe ne Fee Causality. However, it might lead to other research 
correlation by itself tte Pico tesla between variables. Therefore, the existence ol 
causing Y(X — Y) or Y nn — investigator to decide about the direction of casuality: ‘ 
problem to some extent. ? > X). However, there is a way to deal with directionally 





FP tay; 
FUP Payee ot 
Meallute gl Ata yan 


(wy 


a i 
bear} “1h attributing Causalit t 4 
Teeny TY DLW bone | 
we ‘ ei TALK "ie ¥ Makes, ae i 
re Foren lite ¢ aj] , : “sh he ruled tail, in 
3 i ' i ff. . ; 
corres sares correlations between Variables, at: > ALM lies w 8 €6055-lapged pane! 
tig. ee a ‘yera| points tie Perdis ality. Thie val hnic 

7 \ r F i | 1 1, 1 a 
{es'f imes. Eron et al. (1975 Hr Therefore, in ic a lon 
(le en two times. Eron ef al. 1972) Conduct + aNd lags Cinalthien noe 


atwe : ee ef nme |. BEC thee 
bert correlation technique was used, They megaye oy DPOMAN study in ache te IS a lag 
yane =i \ =4SUter] Pretere ¥ ry weve h Cis lapger 


+» (A) and peer ratings of agarpec; nce { 
rogramme i Fai voi, ne TEE eSS I VeEre<. (py 49 - a i LE Wate Ting ViIGlent le ley isi rn 
= fal difed & = : StAte re ih : OW) ahi F es 
qm the ru ia : f “ef . Participated in 4 Stucky ip, pc Sudents from the third grade 
5 found between preference for violent Ty tian i In 1960 Armaxtest corretat é Ae 
Wer ee 7 students trom the same oan ne aNd aBgressivenece Ab. he el 
is, In As By measurin we eee WEFE reassessed onthe w. ness. Alter 10 years, that 
as : gl 7 gra c ate V : - 5 ihe) Variables (A and B| at tw Pepe han Jafiables, labeled 
jelds four different sets of measure: AY An, By POIMS in tine (Time | and Time 2} 
F | 4 a q = = ra 


aon a riable Aot Tim 3 1 ancl B, Means scr i f : ae to ae Sag calculated A 
= a aT fe LIT) Varlar * I lithe nes [ | 
Fy ) ‘ Jie al lin = | and a Oy l. 


ag we know, researc h psychologist: 
LY when they Occur together, when 
x oes to some theory and when other es | 
cel situations, X may Cause ¥ hut ¥ ¢ an't bar 
“ Ce lation: il % possible Ie enhane fa dich 


elf it Benera|| / 7 


A pte Mf |ie ¥ 
OF TA Tht 


TVS rye Nhe 
ML siryg F 


privé it iS called cross bec Alise they ATe ar ; 
a. = AC Pye Ivrry /afiable 


Preference for (A>) 
violent TV among Preference for 
third graders WaKent TY amon 
= al 43h a 
a i | 2 a 
Ea A 
, ee 
we 
we , ra] 
- we Jy a 
Aggression Ss 
among Aggression 
third graders 47th eadeis 
(By ) (By , 


Fig. 23.9.: Cross-lagged panel study showing impact of preference for violent 
TV programme on later ageression 


Of special interest here are the diagonal or cross-lagged correlations because they measure 
the relationship between variables separated in time (see Figure 23.9). Mf third-graders 
aggressiveness really caused a later preference for watching violent TV programme that is, B 
causing A(B, vs. A,), then we should expect a fair-sized correlation between aggressiveness at 
Time 1(B,) and preference at time 2(A,), but the correlation is virtually zero (+ 0.01), On the other 
hand, if an early preference for viewing violent TV programmes led to a later aggressiveness, that 
is Acausing B(A, vs. B,) then the correlation between preference at Time 14) and aggressiveness 
at Time 2(B,)should be substantial. This correlation is + 0.31, detinitely not large but significant. 


Based on this finding, it was concluded by Eron et al. (1972) that an early preference for watching 
the cause of later aggressiveness. The other tour 


violent TV programmes are, at least partially, mere B) 
correlations in the study conducted by Eron et al. (1972) were 0.05(A, vs. A,), 021A, vs. By), 


0.38(B, vs. B,)and —0.05(A, vs. B,). These correlations were not ot much importance. Mes - 
basis of size of the correlation, it is possible to tell which variable causes the : er. ‘i 
correlation between A, and B, is greater than that of between B, and Ay one can nee Y srlioneee 
that changes in A causes changes in B rather than vice-versa. Kohle einen behaviour © 
this cross-lagged panel correlation to answer the question whether attitude ¢ : 


| ile haviour, 
vice versa and reported that attitude was more likely to cause behavio 


8. Third variable explanation 
In correlational research often a high positive corr 
Viewing and variable B, such as aggressive behavio 


een variable A, such as televisic 


ation betw 


ur does occur. Does t 





] i rn ee es Of d- 40S TS Pee eC eae & FE 
664 Tests, Measurements and Reseda 


s excessive TV viewing Causes aggressive beh... 

or B causes Aé In other ee cr prefer to watch a lot of television. There ek does 4, 
mean that aE is heanied relationship. Thus when a correlation is foun ther 5. this 
ope feels aware the possibility that the correlation is due to association betwe We 
ha two variables and a third variable. This is called as third variable explanation, Fr ") Each 6 
nine social adjustment can cause both scsi i oy as pe AS ABETESSiVEnacc Le ble 
among a group of elementary-school ong . oe investigator May fing Wige 
correlation between size of vocabulary and elgnt. : IS my i ation may not ; a high 
relation between these two variables, rather it may resu (from t pp fae that both VOC Pen Wing 
and height are associated with a third variable, that is, age. Thus the apparent tata TY Size 
between two variables might be the result of some variable not included jn the analyse ip 
external influence is known as third variable explanation. SIS. Thic 
CHOOSING APPROPRIATE STATISTICAL TESTS 


it is customary to choose the appropriate statistical test on the basis of the nature of the obts: 
data. If the data fulfil the requirement of parametric assumptions, any of the para taj 
which suit the purpose can be selected. On the other hand, if the data do not fuifj 

requirements, any of the nonparametric statistical tests which suit the Purpose may be s¢ 
Other things to be kept in mind in selection of appropriate statistical tests are the number 
independent and dependent variables and the nature of the variables, that is, whether th of 
nominal, interval or ordinal. When both the independent variable and dependent variabl are 
interval measures and are more than one, multiple correlation is the most appropriate st o 
On the other hand, when they are interval measures and their number is on! "alistic, 


| ; ¥ one, Pearson rma, 
be used. As has been noted earlier with ordinal and nominal measures, the nonparamotr 
Statistics are the common choice. He 


Metric t 
a 
| the Parame “ts 


sometimes, researchers transform the measures so that th 
applied without loss of much information. For example, if tl 


measures are available but the data do not fulfil the 
transform the interval 


Mann-Whitney test. 
In this chapter, we have described sev 
in this text, are summarized in Table 23.35 that shows the purpose of 
types of data for which it is considered more appropriate. 
Table 23.35 Summary of some common statistics with their purposes and appropriateness of 
data for which they are suitable 


€ appropriate statistical tes 


Je scores Of two groups on interval 
requirement of the ft test, the researcher can 
measures into ordinal measures and subsequently, 


apply the 


tmay be 


eral statistical tests. These, and few other not described 


each statistical test and 












Name of | Parametric (P) / IV DV 
statistical test | Nonparametric (Independent (Dependent 
(NP) variable) variable) 

Pearson r Parametric ‘Measures linear One interval or |One interval or 
relationship between _|ratio ratio 
two variables 

Multiple | Parametric Measures linear Two or more One interval or 

correlation (R) | relationship between __|intervals or ratios| ratio 

| more than two variables 

{test Parametric Tests the significance of |One nominal One interval or 
difference between two ratio 
independent or 
dependent groups 

eo (F | Parametric Tests the significance of |One or More 


One interval or 


difference between nominal variable | ratio 


more than two groups 








Name of | Parametric (P) / 
ist cal test Nonparametric 
a (NP) 













































DV 
ENCOUN, | FREMENE wig | ee 
(analys! a F ies OF More Ine inte : 
covariance | | nominal ratio meer 
Multivariate 
ANOVA* 0 OF more 
iM ANOVA) F etween More than two IMtervals or ratio 
STOUPS ON more than 
one Variable 
i_square 
ms ie One nominal 
median Test Nonparametric ae the difference : One ordinal 
een median of two 
independent samples 
Mann-Whitney Nonparametric | Tests the difference Onenominal One ordinal 
U-test between ranks of two 
independent samples 
phicorrelation|Nonparametric |Measures the One nominal 
relationship between 
two dichotomous 
variables 
Spearman’s \Nonparametric |Measures the One ordinal 
rankorder relationship between 
rhe ( 0) Iwo measures 











Kruskal-Wallis | Nonparametric 


Tests the difference 
H-test 


between ranks of more 
than two independent 
samples 

Tests the difference 
between ranks of more 
than two observations 
done on the same 
sample 

Tests relation between 
two variables classified 
into equal row and 
column such as 3 3; 

A x4, etc. 


Qne nominal  |One ordinal 








Friedman rank | Nonparametric 
lest 


One nominal |One ordinal 














One nominal 
Coefficient of 
contingency 


IC) 


Nonparametric 


I 


ae i 
Not discussed in the present textbook 





yt i Bs! Abhi a7 
rat wart i Ve pods oe psa fede hoya 7 


7 Measrinenht 
666 fess. - 


Factor Analysis 


The technique oO! 
ariale 


nec a multiv 
defined as 4 Itiv “ 
correlation coefficrent. Spearme 


sto demonstrate t 
tests and was able to | eg e 


originated wwith C = ah Ona In ere preilan May ki 
ethod, which is used in the ana = of - Ves Or Mal riggs 0 
ed tables ot intercorrelations between Psychologica 

orrelations could be accounted for in terms a 
all tests, and factors which were Specific : 


factor analvsis 
statistical me 
analyz 
hat the interc 


a CTO called as £ a . = bey hyo : 
one general fac 7 “ Hled specific factor or S factor. This was known as the theory of two 5 actors 
unique to eac h test ce a work on factor an alysis was done by G H Thompson, LL Thurstone 


Following Spearman, 
H Hotelling, | P Guilford an 

Factor analysis Is basically 
without reference [to 4 criterion. 


d RB Cattell. 
used to study the interrelationships among a set of Variables 
Thus factor analysis may be understood asa data-reductign 
> Broadly, there are two forms of factor analysis: seiniematory and exploratory, th 
een ee , analys, is the purpose is to confirm that test scores and variables fit a certain 
Soot hamework predicted by a theory. For example, if a theory underlying a certain 
aciigence test predicted that the various subtests gene © three ome such as verbal 
comprehension, attention, and memory factors, then a Ean irmatory pxil analysis may be 
undertaken to evaluate the accuracy of this prediction. In fact, confirmatory factor analysis js 
essential to the validation of many ability tests. 

Exploratory factor analysis is comparatively more popular where the central purpose js to 
summarize the interrelationships among a large number of variables in a concise and accurate 
way for providing an aid in conceptualization. For instance, an explanatory factor analysis may 
help a researcher discover that a battery of 25 tests represent only five underlying variables 
usually called as factors. 

Factor analysis is usually applied to data where a distinction between dependent variable 
and independent variable is not meaningful. Major concern is with the description and 
interpretation of interdependencies within a single set of variables. Factor analysis achieves 
these purposes in two ways—first, it reduces the original set of variables to a smaller number of 
variables or factors; and second, factors tend to acquire meaning because of structural properties 
that may exist within the set of relationships. Thus the process of reducing the number of 
variables and the concept of structure are basic to the understanding of factor analysis. 

All techniques of factor analysis begin with preparation of a complete table of 
intercorrelations among a set of tests. Such a table is known as correlation matrix, which shows 
the correlation between every variable and every other variables. Subsequently, we find the 
linear combinations or principal components of the variables that describe as many of the 
interrelationships among the variables as possible. In fact, each principal component, also called 

as factor, is extracted according to mathematical rules (which is complex) that make it 
independent or unc orrelated with other principal components. The first component is usually 
most successful in describing the variation among the variables, with each succeeding 
con Motvactbie cae | A factor must be viewed as a variable but with a simple 
derived by a process of anal sis f Las ha ei whereas factors are hypothetical variables 
this, a person may be said ale sil lan * ve lables obtained by direct measurement. Despite 
score on.a test. When factors a - arabes factor in the same sense that the individual has a 
micelle hosein tees ie Bris ci a , they are said to be orthogonal and the method 
oblique and the method bhocheoed laxein is pees When IaCIOrS ae Pe related, they are said to be 
IS a practical combination of the tests ‘ised ae i ‘ DF We solution: fur we know, each factor 
tests and subtracting out fractions of ot her te Wi. Hh cae ly determined P ortions of goles 
method used to derive them. Severa| different ee 2 dias makes factors special is the analytical 
Principal component factors principal ; oe | ers and some of the common ones are: 
» PNNCipal axis factors, method of unweighted least squares, 


image-factoring, alpha { 
* pha factoring, max tr aS 
methods are beyond the scope ofikis coe likelihood methods. The discussion of these 


f 
Air ying f peal 


Meal iapds cal Ay tal 
| Pal Lan i 
| ra] mil lik fae lor F , ' i 

xe ; orrelal if ’ viotry 1% pare wifer) 


‘aris called factor loading showing the weight oF load 

: = : ‘ J . 
cere 94,56 presents a hypothetical factor Matrix ine ih. 
fa ce the top and their weights in each of the 
across 


hf Tra : 
riatrre Crasists tal a tatol 1 
sei’ fy 
vires Ihe fae Tes of wie hy 
B Only twa fact 
2 heb Tit yee, 
ve Fs r! Yr) if the 


Table 23.36: A hypothetical factor matrix 


tif Pach cv 


post: 


| L | The fae tors are 
jiste* 


aD Cpt tate roa 


D tests 


Tests Factor | = 

letion Factor Il 
ie sentence comp ras 
2, Vocabulary ences 0.77 
3. Reading comprehension peed 
4, Analogies . ne 
6. arithmetical roblems nin 





Thus factor loadings range from =, a perfect negative correlation with a factor, through 0, 
relation to the factor) to +1 , 2 perfect positive correlation. Variables have loadings in each 
but will have high loadings on only one, Normally a variable js considered to contribute 
meaningfully toa factor only if it has a sae at least above U.30 lor below ~0.30). Note that 
the first variable (or test), sentence completion has a strong positive loading of 0.75 on factor |, 
‘adicating that this test is reasonably a good index of factor |. Also note that this variable has a 
modest loadings of -0.24 on factor ll, indicating that, to aslight extent, it measures the opposite of 
this factor that 15, high scores on sentence completion tend to signify low scores on factor II and 
vice versa. In a sense, a factor is produced by ‘adding in’ caretully determined portions of some 
tests and perhaps ‘subtracting out’ fractions of other tests. What makes factors special is the 
elegant analytical methods undertaken to derive them. 
Several different methods for analyzing a set of variables into Common factors are available. 
As early as 1901, Pearson pointed the way for this type of analysis. Later on, Spearman (1927) 
developed a precursor of modern factor analysis. In America, Kelley (1935) and Thurstone (194 7) 
and in England, Burt (1941) did much to advance the method. Although these methods differ in 
their initial postulates, most of these methods yield similar results, For a detailed study of the 
specific procedures of factor analysis, the reader is referred to texts by Comrey and Lee (1992) 
and Loehlin (1992). . | . 
There are some basic issues in factor analysis that should not be ignored. A review ol 
literature shows that factor analysis is frequently misused and often bipedectualectawie 
investigators appear to use factor analysis as a kind of diving rod hoping to find aes i na 
underneath tons of dirt. But the reality is that there is nothing magical ean pits i seal vl want 
neither factor analysis nor any other statistical analysis can rescue data is pen aise ri 
or haphazard measures. However, factor analysis can yield clan | aie ke a mind: 
research was meaningziul to start with. The following three basic issues pes ee ened 
(i) A particular kind of factor can emerge trom factor ane et esate ee dante 
measures contain that factor in the first place. ae eee mene function. Thus, 
emerge from a battery of ability tests rnone ol the ed pitty a inue 
in factor analysis quality of the output depends pon ; q FS eoisiaeal empoll 
(ii) For a stable factor analysis, the size of ee ead comiorting, in general, to 
According to Tabachnick and Fidell (2006) it Is cons : 


: = we ouidelines suggest that a 
have at least five subjects for each test or variable. Comrey $ guideli hs) -d 1000 
sample size of 50 is very poor, 100 poor, 200 talf, 


300 good, 500 very good a 
“ic (C 2006) 
excellent for a dependable tactor analysis Te att . 
2g se ee 
(iii) A third issue is that we can’t overemphasize | ee 
analysis is guided by the subjectlv 


(no 
factor 


which the technique of factor 


judi jal point in 
e choices and theoretical prejudices. Acrucial p 








iaht ioral iencet 


j Rees hi ydaothorl! inf 
gre : 


ure 
: a, Moarurer = oblique axes. With on 
“™ bemween orthogonal a whi h means that Powe 
~ce Gem ane another, lhe dt ine 
. the chor les to one « Y are 
his rege tors lie at right ane! a of clusters of factor loadings, oblique Shes 
% 1 . rase : — 
axe the 7 Wi matty . 


oc toot Ww el the factor are c orrelated amon. themselves. Ir, 
§ | ke With oblique ARES, = ; 
(HLF. . 


jered better than onhogonal axes because with Oblique 
NT ens 4e factors thernselves. Such procedure tenes 4, 
{ Ay 


ze tt 
sctor analyZ© . tend to provide Reta 
ctors. SC ond-order factors t€ provide support lor 


urH cwrelale 
provide a better 

) oblique 
< it's possible tof 
ng and-order ia ¢ 2 
ion ol traits/abilities, €'- 


apes, Important advantages are as under. 


axes are 
germ =f ak 


rotations, 
ri me 
vie led iF — = 
the heral hical organlé 


or more > 
all 


£ © I J ! advan 


sarcher may combine, throug 

eee : le, the researcher may Com , 2ugh facte; 

oa single factor. For exampre :  itlifting into a single fact tor 
variables into 4 pawing and weightlifting into a single actor caller 
analysis hating jumping. ball throwIng ae 
4 , F 

eral athletic ability: tvs ieee a: 2 = 

gen vivsis helps the researc her in identifying the groups of interrelated variables tc 

(ij) Factor ana ysis | elp re related with each other. For example, Carroll used factor analysis 

: : yi! are : " Ae : i 
un : hic three-stratum theory ol intelligence. He reported that a factor called 
wn elo grit perception Was related to auditory task ability and another factor called 
= stasiat perception was related to visual task capability. He also reported that a 

lobal factor called g-factor (or general intelligence) was related to both these factors, 
Sat is, broad auditory perception and broad visual perception. This, then, automaticall 
—_ that someone high on p-factor, will also be high on both broad auditory 
perception and broad visual perception. 

Nisadvantages are as under: 

(i) In factor analysis, there 1s occasional disagreement over the proper label for the factors, 
The technique itself only ‘dentifies factors and what they should be called is left to the 
researcher's intution and judgement. 

(ii) Factor analysis can be good to the extent the obtained data allow. In psychology and 
education, where researchers often have to depend on less valid and reliable measures 
such as self-reports, ratings, etc., the technique can prove to be a problematic one, 

fii) In factor analysis, more than one interpretation can be made from the same cata factored 
the same way. Factor analysts fails to identity this type of casuality. 


J Review uestions 
What is normal curve Discuss the major characteristics of a normal curve. 
2. Distinguish between parametric statistics and nonparametric statistics, 
3. What is a Chi-square test Discuss its uses and applications. 
Distinguish between Type | error and Type I error in statistical significance. Also, 


e plain the role that level of significance plays in influencing the probability of making 
each of these two types of errors. 


5. 


What is Yates’ Correction Under what conditions is it applied 
Why is Student's fso named 


Deis the difference between Correlation and Regression. 
Write short notes on the followings: | 
(a) Correlation anc i i 

ac an and regression Re iff 
oven (b) Rank difference correlation 


(d) Multiple correlation 


at 


24 
WRITING A RESEARCH REPORT 


‘ AND 
A RESEARCH Prc IPOSAL 


CHAPTER PREVIEW a 


4] Purpose © of Writing a Research Report 


Gener i 
or Format of a Research Report 


crructure 
Tithe Page 
Abstract 
fntroduction 
Method 
Results 
Discussion 
References 
Appendix 
Author ote 
e style of Writing a Research Report 
e Typing the Research Report 
e Evaluating 4 Research Report 
e Preparing 4 Research Proposal 


GENERAL PURPOSE OF WRITING A RESEARCH REPORT 


The writing of a research report is no less challenging a task than the research itself. It requires 
imagination, creativity and resourcefulness. Research reports should be written in a dignified and 
objective style, although there is no one such style which is acceptable to all. 


The general purpose of a research report is not to convince the readers but to /et them know 
what has been done, why it was done, what results were obtained and what the conclusions on 
the researcher were. Therefore, the research reports aim at telling the readers the problems 
investigated, the methods adopted, the results found and the conclusion reached. The research 
report should be written in a clear and unambiguous language so that the reader can also 
objectively judge the adequacy and validity of the research. For attaining objectivity, personal 
pronouns such as |, you, we, my, Our, etc., chould be avoided and as their substitutes, expressions 
like ‘investigator’, ‘researcher’ should be used. Needless to say, the highest standard of correct 
usage of words and sentences is expected. 

To achieve these purposes is not an easy task. The matter is more complicated due to 
variations in style of writing research reports. However, the problem can be minimized if we 
adopt the style for research reports presented in the next section. 


STRUCTURE OR FORMAT OF A RESEARCH REPORT (Style Manual) 

The research report, whether it is based on a dissertation or short-term research paper, ei "3 

of fairly standardized pattern. There are different types of style manuals of he Aerie ESE : 

lact, different departments or institutions develop their own style manual to ahi ere 
669 








hearioMre Sciences 
Research werhods ut Bebaviot! al 
j Researce 


Tests psurerients and 
evo fet Met ful amination, however, reve ale: that — 
Caretu ex : tev 
| - anuals basically agree on ; 
, ise eneral, all style m 
dissertations a 


etails only: Is commonly used in Social scj 
ed with minor d The style m anuals © } ci 
conceme sentation. | 

clear prese™™ 


of correct an plication Ma 
+ - The Publica 
humanities are: eet igh Style, 16t ed. 


Pring; 
en Ces 


(2010); the MLA Handbook for Writers of Re 0 


; $e 
The Chicago Manua! ° and Style: Research Papers, Repor’s Theses, 11th ed, Slade 
Turabja, 
Manual 


99), Form f Term papers: Theses and Dissertations, 6th ed. ( 


the most widely used style manual is the Publication 
-hological Association. The Pu blication Manual has 


some excellent SUB 


7 | mm an i j i i : 
listen Se aptabillty of the report. The following outline or format is the most popular 
increase rea | 


research format prepar 
Manual, 5th ed. (2002). 


|. Title Page 
A, Title 
B. Author's Name and Affiliation 
C, Running Head 
il. Abstract 
i], Introduction (no heading used) 
A. Statement of the Problem 
B. Background/Review of Literature 
C. Purpose And Rationale/Hypothesis 
IV. Method 
A. Subjects 
B. Apparatus or instrumentation (if necessary) 
C. Design (if a complex design has been incorporated) 
D. Procedure 
V. Results 
A. Tables and Figures 
B. Statistical Presentation 
VL. Discussion 


d smoothness of concise expression and how to avoid ambigyj 


A. Support Or Nonsupport of Hypotheses 


B. Practical and Theoretical Implications 
C. Conclusions 


VII. References 
VII Appendix (if appropriate) 
IX. Author Note 
A. Identify each author’s departmental affiliation 
B. Identify sources of financial support 
C. Acknowledge contributions to articles 
D. Give contact information for further information 


Thus we find that the APA Publi 
: vigesa ns € APA Publication Manual divi 
a aE ‘i aiiiats ual divides srinat of writ . 
into nine parts. A discussion of these Parts ts given on caeaeaae ne = 
fe. 


nual of American psychological Association; Sth ed. (a9, 


ae a iv é 

estions as to how to write a report effectively, how to present the basic iden 

end. | ee 

YY ang 
aa ees typi 

| according to the American Psychological Association's Publicatic 


Writing « Kesearch Report 
and a Research py sposal 671 

ritle pape . title pa — 

e first pe ell te A ehed contains three ele 

4iliatio™ as W . a MOCUIRS HON of these elements j Arik! title, author‘ 
af Title and running head. The title should be eins IS Biven below, 
wo of the study. It should not be stated so broadly th mes and should cle 
purpe n't be generalized either from the data gathered it see 
the ations > 
Abb a ately. Introductory phrases such as “A Study of. 
appre d since it Is understood that the investigator i 


5) and 


arly indicate the 
Ms to provide such an a | 
: nswe' 
hould not be used in the title. The bee | soa etree employed 


bes of a title is 12 to 15 words 
r "AN investigation of...” should be 


| it| i ; § study} ‘ 
avo! he title like “A Study | eat Ying something. 
xpre° decks in paired bo a unlearning, spontaneous _ nee ; 
a forcemem ef ol e and serial learning”, should be avo; y and partial 
Oe woul: «(Unlearning, spontaneous recovery and part avoided. The preferred 
tif 


woe said serial learning.” The title:-chould be weed tn lal reinforcement effects in paired 
Flat , 


asmred on the page and when a lines are needed, they shld be totldhe apes case letters 
ee cunning head is an abbreviated or a short title which is printed at the tc fth 
plished article to identity ri articles. On the ttle page the running head, that is, the shor it 
which Is 4 maximum of 50 ct rom including letters, punctuation and spaces between ices 
may be typed on the ee ; side of the page. It is typed in all capital letters at the bottom of 
ihe title PAB The running head in the above example may be “Unlearning, Reinforcement and 
pearning.” = 

g. Author's name and affiliation: On the title page, the author’s name should be centred 
below the title, and the next line should indicate the name of the institution to which the author is 


affiliated. if Gase oF Mor than one author from the same institution, the affiliation should be 
listed last and only once. 


ee le ae 
se" 


= Se -— ee ee 
So -—= J oo = 
i 


UNLEARNING, REINFORCEMENT AND LEARNING soiclaaaiaihdeiaialaaia 


Unlearning, Spontaneous recovery, and Partial reinforcement eifects 
in Paired associate and Serial learning 


ALOK K SINGH 
PATNA UNIVERSITY, PATNA 
| wish to acknowledge the assistance 


of the computer facilities rendered by the 
concerned institutions. | also wish to thank 


many students, teachers and principals of the 
various schools from where the reported data were collected. | also wish to thank K K Sinha, 
PhD. for his critical comments and B Sen, PhD, for his secretarial assistance. 

Running head: UNLEARNING, REINFORCEMENT AND LEARNING 


- 
DA RARE RRR eA RAR ECC CREED TS 


Fig. 23.1. An illustration of a title page 


ll. Abstract 


The abstract is written on a separate sheet ot paper which is page 2. It describes the study in about 
100 to 150 words, In fact, the abstract is the summary of the study, which includes the problem 
under the study, method, such as characteristics of the subjects, research design, appsreeee*; 
results (including statistical significance levels) and conclusions a5 well as implications. 


References must not be cited in the abstract. 


il. introduction 


This starts on a fresh page and that Is page 3. From this, the text of the pen one ete “ 
sections of the test follow one another without a break, that is, Introduction, Ve 0G, 








— et 5 4 oe sal i" = 7 
a 7 Fy fT + a yj oe te ic ae 
=“ ed F i ! ff j wn i 1 hat fil | 
i | yart'l {oF ia ¥ rive fy 
fed Rese 
T il \ cm fat 4 Ii if 
5 P| i 
672 r- 


hout any break and it 1s not necessary that they ae 
without aly ™ 
afte! ano 


> a OME 
. cians folla 


led as ‘Introduction’. However, the running 
tarted on NEM page on starts, is NOt labellea at 
Sli : 4uctior Se > 
here Introt 
Page 3, W 


| title without the name of the authoy j, 
the page and the complete Ze WW anes ore 
5 cide of THe pes at i ag three COME . 
head on the uppet right same A good introduchion ‘ 15 ee 
the centre of the page are eT ‘ fe a cleat and definitive statemer : Seas in a 
“a The researcher must es light of pertinent studies. He mus Ms : 2 for the 
i ~ . F e | eS 3 : | 
develop tt logic ally in the problem is important in terms ot ¢ ry < p actice, 
present research eee tot introduction is the review of the previous liter ature: 
Gi) The second important speliseicentee an understanding of the existing literature relevant t, 
" ‘The researcher must try to esta - nect logically the previous body of literature with ths 
his or her study. He needs to conne ! 
present study. ; | onent of the introduction is to formulate a clean rationale of the 
Wii) The third and final comp 4. Every hypothesis must be clearly stated so ret itiecless a 
hypotheses 10 be peer cocaine ifierent variables and terms should be properly defined 
lly tested, Diite 
+ would be scientifically teste" 
and investigated. 
IV. Method 


Th in body of the report continues with the method section after introduction. The major 
he main 


<e of this section is to tell the reader how the research was conducted. The method section 
urpose C lee HOMIE Ure 

i esate from introduction by the centred “Meth 

labelled at the left margin and u 


od’. The subsections are subsequently 
included in the method section. 


nderlined. In general, the following subsections, in order, are 


A. Subject: The population should be clearly specified as well as the method of drawing 
samples should also be stated. In specifying the population, such characteristics like age, sex, 
geographical location, 5E5, race, institutional affiliation, etc., should be clearly mentioned. Not 
only that, the total number of subjects and the number assigned to each condition should also be 


spelled out. The researcher should also state that the treatment given to subjects was in 
accordance with the ethical standards of APA. 


B. Apparatus: \f the research has been conducted with the help of some relevant 
apparatuses, their names and model numbers should also be mentioned so that another 
researcher, if he wishes, may obtain or construct it. If possible, the researcher should provide a 
diagram of the apparatus in his write-up and this is extremely important where the apparatus isa 
complex and novel one. Minor pieces of apparatus such as pencil, pen, blade, etc., should not be 
listed. 


C. Design: The type of research design should also be spelled out after the subsection of 
apparatus.” Here the procedure for assigning participants to groups as well as labels to groups 
(experimental or control) should also be indicated. 'V and DV should be clearly spelled out and 
these variables should be carefully defined if they have not been defined in the introduction 
section. The technique of exercising experimental contro! should also be spelled out. 

. Procedure: This subsection describes the ; 
study. This includes the instruction piven to the 
administered and DY was measured. It 
order of assessments if more than 
easy replication by the subseque 


actual steps carried out in conducting the 

| subjects (if they are human), how IV was 

also includes assignment of subjects to conditions and 

one, Anyway, enough information must be provided to permil 

nt researcher. 

® lh 
te incluried In hea compara a complex design, the “desivn’ subsecti 


m may be dropped and rubries of design May 
_- oe 


Writin, 
4 Resear, 
SrCh Rene and a Kesearc hy Pr 
opera] 


674 
oat section of the main body is the ; 
AL - 


esults w 
jon about how the conclusion Was reached, Pere 
| vajevant to test the hypothesis. All relevant ra ine 
he h pothesis. ae and Figures are commonly em 
: consists of several nu 3 : sata 
: ‘al. A table catia at Sones that SUMMarize the MENTING textual 
mater nt. A figure 1S @ Braph, photograph, chart or like mateo it ndings of the 
e nee for certain kinds of data like showin erials, 

‘eu 


8 the pr ; + which are reley; 
ted period. Data In the text and in tables Progress of learning _ 
signa 


Or figures should OF Maturation over a 
des id be ¢ omplementary. Results of the statistical analyses carri not be redundant, rade: 
shou 


- nificance for these statistical analyses should Iso k ed should be provided lad 
el of sign! ; di din this s a Presented. Howe f 

leve interpreted and discussed in this section. EVE, 

not bE 


they should 


oyed for supple 


vi. pie ek of the main body report is discussion. The major functi 

The yan the results of the study and to relate those results to oth Sidle Nemo ee 
interp .acluding the hypotheses, supported or not supported, 
study to the hypothesis, some new explanation is required 
wee! d. New hypotheses may also be advanced about any u 
ee es faulty hypotheses can be modified to make them c 
senate results should also be discussed. Such results 
vmething but the results don’t support that prediction, 

occurred is sufficient. A brief discussion of the limitations 
future researc 


er studies. The implications of the 
are discussed. If the findings are 
so that a new hypothesis may be 
hcommon deviation in the results 
onsistent with the results obtained. 
occur when a hypothesis predicts 
A brief speculation about why they 
. 3 of the present study and proposals for 
h is appropriately discussed here. Here the researcher finally includes conclusions 
that reflect whether the original problem is better resolved as a result of the investigation, 

Vil. References 


The “References’ section begins at a new page with the label ‘References’ at the centre. 
References comprise all documents including journals, books, technical reports, computer 
programms and unpublished works mentioned in the text of the report. References are arranged 
in alphabetical order by the last name of the author(s) and the year of publication in parenthesis 
or in case of unpublished citations, only the reference is cited. Sometimes no author is listed and 
then, in that condition the first word of the title or sponsoring organization is used to begin the 
entry. When more than one name is cited within parenthesis, the references are separated by 
semicolons. In parenthesis, the page number is given only for direct quotations. The researcher 
should check carefully that all references cited in the text appear in the references and vice versa. 


References should not be confused with Bibliography. A bibliography contains everything 
that is included in the reference section plus other publications which are useful but were no 
cited in the text or manuscript. Bibliography is not generally included in research reports. Ont 
references are usually included. The Publication Manual of the American Psychologic: 


Association, 5th ed. (2002) has given some specific guidelines for writing reference 


s of varioL 
types of works as indicated below. 


\. For references of books with single author: 


Siegel, S (1956) Nonparametric Statistics for ¢ 


he Behavioural Sciences, New Yo 
McGraw-Hill. 


For references of books with multiple authors: 


Guilford, )P & Fruchter, B (1978) Fundamenta 
New York: McGraw-Hill. 


For references of Editor as author: ; 
Misra, G. (ed.) (2011), Psychology in India. vol., 4. Delhi: Pearson. 


| Statistics in Psychology And Educati 


shavioural Sciences 
ts and Resea rch Methods in Bebaviou ral 
wemenis and Kes 
674 Test Meast 


‘ation as author: 
rporate or association as au at | 
4, For a ae Association (1983) Publication Manual (3rq ed.) Wa 
American . 


Shin 
ferences of journal article: | | Bton, 
5. For are L & Kenny, KC (1946). A comparison of Thurstone and Likert techn 
pee Scale Construction, Journal of Applied Psychology, 30, 72-73. ues 
6. For references of thesis or dissertation (unpublished): 


Singh, AK (1978) Construction And Standardization of a Verba| Inte 
Wier, ; Sd . 3 
cinousichea Doctoral Dissertation, Patna University, Patna. 
7 j ited book: 
reference of chapter in an edited 
7. tren RC & Shiffrin, R M (1968) Human Memory: A proposed system an 
processes. In The Psychology of Learning and Motivation, ed, 
Spence, vol. 2, pp 89-195. New York: Academic Press. 
8. For references of magazine articles: Basant., K. (2013, August), The usefy 
psychology to common people. Psychology Today and Tomorrow. 15~20. 
9. For references of Unpublished paper presented at a meeting: | 
Singh, A. K. Kumar, P & Sen Gupta, A (2005, July) Demographic Correlates 9 
Paper presented at the Annual Meeting of Council for Social sciences, Lucknow. 
10, For references of unpublished manscripts: | 
Gupta, R. (1995) Social awareness in relation to media a 
unpublished Manuscript, Ranchi University, Ranchi. 


Orne, M. K. (2013) A scale to assess 
submitted for Publication. 


Inesg 


Big Five Personality dimensions. Manuscrip 


11, References of citations from internet sources: 
American Psychological Association (2001) APA style Homepage. Retrieved, July 30, 
2001, from http://www.apastyle.org/, 


VIII. Appendix 


In an appendix, those items of information are provided that would be inappropriate or too long 
for the main body of research, Each appendix appears as a new page with the label ‘Appendix’ 
along with the identifying letter, centred, Usually in an appendix, materials such as tests, detailed 
statistical treatments, computer programme, etc., are included, 
IX. Author Note 


In this part the researcher writes about four basic thin 
acknowledgement, cont 


act information and each auth 
contributed to the research work. 


STYLE OF WRITING A RESEARCH REPORT 

The research report should be written in a style that is clear and concise. Slang, hackneyed and 
folksy style should be avoided, The research report should describe and explain rather than try to 
convince or iid be used when they are most 
should be discouraged to ensure propef 
name(s) of 


the cited author(s) should be used. Titles like Dr, Mé 
Professor, Dean, etc., should be omitted. In describing re 
be used, Abbreviations sh 


search procedures, the past tense should 
ould be used only after their ref 
parenthesis. Of course, th 


referent has been clearly spelled out ‘ 
/ Mere are a few exceptions to this rule for well-known abbreviations suc 
Important and standar 


Ar d statistical formulas are Not presented in the research report. Detaile 
statistical computations are also not includ 
Statistical analysis, it 


= ) in 
ed. However, if some unusual formulas are used 
's appropriate to mention them. 


gs including source of financial suppor, 
Or’s affiliation, in case many authors have 


, 6=—O 


| | nt 
K W Spence = A 


of 


i Job Stress 


mong college Students 


pers beginning a sentence sh 
Numbers ) © should alway 

n ten should be spelled Out. The expression’ be pelled out. Fractions ar 
the’ tions the expression should be lik etal : 
(ra 


fa “hy FY) Sand 
nundred) should be spelled out unlesc i: 
yun 


nat | = Numbers f lec 


fo 


5S it is | "all figures with 
5S It 15 In lables E Wwo per cent’ (wh | \ 
has 20 percent should be used unless they begin — figures, Ar "yale ich means per 
sari or millions such as 20,305 632 enten 
tno 


als with per cent 


Ce. C 
Ommas should Clearly indicate 


Apart from wie antlers some general 5 

; : SnOuU | Ini 
conforms to ABR Style @De followed, These genera| rules hark search te 

1, Since a research report describes 4 Completed sts uy € discussaq 
rense. The exception is to write in the Present tense only the Yr It should be ‘ni 
future situations. 


Conclusions that 
2, All sources from which information are obtained ay 
uthors and the year. Reference can be used 


fined learning “...”, Or the idscarcie re subject Hilgard and Bower 
(1 ee Le arning is defined as . (Hilgard Pa Bis coe and provide the reference in 
ed in pace ot “and”, When there are three tO Six authors, all meen WA 
first time of citation ane subsequently, can be referred to by using the first author and the Latin 
phrase et als Vis) ste : Scitation, we'can write Hilgard, Bower and Atkinson (1999), but 
subsequently, wie Sal sey Hilgard et al. (1 999)". If there are more than Six authors, we can cite 
the name of the first author and etal, even the tirst time. 

3, Abbreviations should be avoided. They are justified only in (a) 
throughout the report, lb) a term Consists of several words, (c} numero 
are not being used. If an abbreviation has to be used, this should be don 
using the first letter of each word of the term. The complete term shoul ! 

‘s used, with its acronym in parentheses, For example, Long-Term Memory (LTM) is defined as 


#...", Subsequently only the acronym, except as the first word of a sentence, should be used. Asa 
frst word of a sentence, the complete term should be used, 


Porv/article that 
as under: 

itten in the past 
apply to present or 
€ Cited by only the last names of the 
of the sentence: 


it appears very frequently 
us different abbreviations 
e by creating an acronym, 
d be defined the first time it 


4. As tar as possible, direct quoting should be avoided. The idea should be paraphrased 
and summarized so that readers should know what the authors intend to say. Besides, aterm 
should be mace to address a study itself, not its authors. For example, the phrase Hilgar fe 
mainly reiers to a reported study and not to the people who conducted it. In this way, we sh 


write "the results are reported in Hilgard et al. (1999) instead of "The results are reported by 
Hilgard et al. (1975)", 


) o.as “this study" or “the present 
5. For distinguishing your study from other studies, refer to as "this study P 
study". inology should be used. When a 
: ical terminology should” v 
6. As lar as possible, PE PENNE, define the term or word for the first time and 
nonstandard term or name of a variable is to be used, d 
then use that word consistently. | ses and nuribiers Hat areT0 
7. Numbers between zero and nine should be le — a“ size number if (a) the 
and larger should be written in digits. However, digits oe : sae or larger, or (b) the number 
author is writing a series of numbers in which at least one <n reas reanent. NO sentence 
expresses a decimal or refers to a statistical result of to a precise m 
should begin with a number expressed in digits. c term subjects are used, the APA style 
8. Although in most research reports se for it. Therefore, Participants rather 
: A 15 i ‘ oF r 
Preters the use of a less intentional and more a "nor descriptive terms such as men, 
7 a appropriels 
than subjects should be written but where apf 
: be used. 
women, children, pigeons, etc., shoule 4 : he above ge 
If a research report is written keeping ta understanding the study, (b) 
Provide readers the information TT aaky 
and (c) performing a literal replication of Ine 


| rules in view, It would paar 
nai evaluating the study 








nH if PAcIONK ity 
i Test i 


REPORT 
E RESEARCH R , cite tiaiiecn e 
TYPING TH “baliy of a esearch weiter 10 present manus¢ ls mate rial to the 
i dietiae aan ) wm Except for minor typographical Higa: Ihe correction Of my 
ape OF OH. be sant: ry Weve Whe ch 
> ‘mei Fly of the professional ‘Sn gaia F wale typing a re 
ot the respons carefully be followed: 
following guidelines should carefully be tolloy dali chadinas’ as 
1) A cood quality paper or bond paper, prefera ONS 
shied swale lat aterials. 
. sheet is used for typing the ma 
nly one side of the sheet is used 
con aiven nee bottom, left and right should be 11, inches. 
(2) All margins, that ts, top, bottom, Tell « | f : 
= »end of the line unless completing 1 gail: 
(3) Words should not be split at the end o! a cas diction ‘ i - them would defining 
rt t P se qe 1 ; ; r = or i ; 7 
interfere with the margin. If a word is to be divided at all, ¢ ary YFFECT syllable shout 
be consulted, 3 1 
‘ - “A ak i 4 are * my . 
(4) Direct quotation of three or less than three jee : J, es are - uded IN the lext and 
=: i tati | larger length (that is, more than ¢ is. 
enclosed in quotation marks. Quotations of larger leng 2 we a ;. an three | it 
aes a ll se _ 7 arc Wh ch “4, : . 
lines) are set off from the text in a double-spaced paragraph and indentec about five Spaces from 
the left margin without quotation marks. 


Ypisp in 


| 
aor fe Mh 
search repo i 


eth 


¢ 
s!2€, should be y 
Seq 


(5) Pagination information is given in parenthesis at the end of the direct QUOlation, 


(6) Words, letters, digits or sentences which are to be finally printed jn italics shoul 
be underlined, 


EVALUATING A RESEARCH REPORT 


A critical analysis of a research report is a valuable aid for understanding and 
the nature of problems, methods adopted for solution of the problems, the wa 
processed and conclusions reached. For making a critical evaluation of 
following questions may be raised. 

1. The title: (a) Is it clear? (b) Is it concise? 


2. The problem: (a) |s it clearly stated? (b) Is its significance r 
questions been raised? (d) Are testable hypotheses frame 
limitations recognized? (g) Are important terms defined? 

3. Review of the literature: (a) Does it well cover 
previous researches noted? (c) Is it well organized and 

4. Methods: (a) Is the res 


gaining insight into 
ys in which data are 
a research report, the 


ecognized? (c) Have specific 
d? (e) Are assumptions stated? (f) Are 


the area? (b) Are the main findings of the 
summarized? 
earch design appropriate? (b) Is the research d 
detail? (c) Are the samples adequate? (d) Are the extraneous vari 
controlled? (e) Are the psychometric properties of d 
satisfactory? (f) Are the statistical tests appropriate? 


9. Results and discussion: (a) Are the a 
discussion in 


esign described in 
ables well organized and 
ata-pathering instruments or questionnaires 


a Ppropriate uses of tables and figures made? (b) Is the 
‘Us the text clear and concise? (c) Is the data analysis logical and perceptive? (d) Is the 
statistical analysis correctly interpreted? 

6. Summary and conclusion: ( 
hypotheses restated! 


reported? (e) Are the fi 


a) Is the pro 
(c) Are the procedures well 
ndings and conclusions base 
Using the Pattern supgested above, 
report which would, int 
Preparing a good report, 


blem reinstated? (b) Are the questions or 
described? (d) Are the findings concisely 
d upon the data collected and analyzed? 

the reader can make 


ibis a critical analysis of any research 
urn, help him in developing compete 


ncy in conducting a research and 


PREPARING A RESEARCH PROPOSAL 
The writing of a re 
elucidate the steps 

the nature and nee 


search proposal is an 
Involved in preparing 
d for a research 


important aspect 
a research 
Proposal. 


of the research process. Before we 
Proposal, it is essential to throw light upon 





Writiy, 
Ma Reser, fi Mepiyry 
Pe OFL eth a 


A research pr posal is a detailed Plan ofy or Mesetarch Prope 677 
aes - architee Pnare Searcy l 
opint which the arc Hect prepares heforg tives i © be conductor ” 
pluer al provides the basis for Valuation Ol the sut ey, of building Comn “Mparable to the 
pre ua must be submitted before it Is fin elles Project. ny ae The research 
rOpye. ve rpc “hy : : OPP rove SMUTIONS, per) 
al or their research work by submini.. OVEd, Researe Pree, equire that 
port lor th — y Mitting Hran| Proposal NEM often seek financi 
sulPRe . The mayor purpose ofa research ro . c Sto the payer | ; ncla 
orice implemented, may result | Proposal is 1 ensuite a wr oe and even private 
2 ae ny sninniiaiaeaies ich) INLO analy. oe # WOtKable enacts. 
h whe eh Parr a ps analyzah : i+ “SPETIMERt | 
ling of significant scientific merit, Y2able and Mlerpretable piece of tei a 
find! ta Receaee a “search or 
Generally, 4 es lala Aas lollows the general j 
soak arious sections are a bit diffe hoe oO lal format nf a ; | 
head ing> of Tae a aeaaiCh 3 ferent, The lollowing are the nie hans article but 
followed in prepa 5 Proposal, (Many Institutions OF apenc| ie that are generally 
mats for the research proposals, ) B CIES May suggest some other 
jorme™ 
1. Problem 
2, Definitions, assumptions, limitations or delimitations 
3, Review of related literature 
4. Hypothesis 
5. Methods 
6. Time schedule 
7, Expected results 
8. References 
9, Appendix 


These nine steps may be discussed as follows: 


1. Problem: The problem of the research Proposal is expressed ina declarative statement 
but it may be in the question torm. The problem must be stated in such a form that it ¢ 


learly tells 
about the major goal of the research. Besides its formulation, the researcher must mention why it 
is worth the time, effort and expense required to conduct the proposed research. In other words, 


the proposal writer should not only mention the problem clearly but must also demonstrate its 
significance. A few examples of problem statements are as follows: 


(a) Coeducation improves the morality level of students. 


(b) Active participation by students in politics may have damaging effect upon their 
creativity. 

2. Definitions, assumptions, limitations and delimitations: The proposal writer must 
define all the important variables included in the study in operational terms. These cates 
provide a good background with which the researcher approaches the problem. fre 
implied in the proposed study should also be clearly igo a in dk 
which the researcher believes to be a fact but he can’t verity them. Limitations shia amey viaetl 
also be clearly mentioned. Limitations generally include those factors om iis en 
beyond the control of the researcher but are likely to influence the ae i pect ce 
study, Inability to randomly select subjects for we sul and wees "Delinitations of the 
data-gathering instruments can be good limitations of any a ann vacthe bountiaries ef the 
Proposed research should also be clearly spelled out, sa econ se ake He 
study. In other words, delimitations clearly tell who will be inctu 
and for whom the obtained conclusion will be valid. 

3. Review of related literature: The research ey 
review of the relevant literature. An effective ent 
Nave been competently executed and clearly reporte 


osal should include a more — 
iterature includes those studies W we 
and are closely related to the pre 


. parioural Sciences 
aah Methods in Bebavioura 
a a eg ts ana Research 
67s Test Measuremenms 
-- familiar Wi tis alread 
ensures that the researcher Is familiar with what m5 7 eve aig What 
problem. This step ensurs erified and tested. Moreover, it also helps to eliminate 4 
known and has {0 be ve ides the background for usefy| © the 
still ai m ¢ what has already been done and provi es “ B SeTU SUBRE stig 
duplication © nee im search of related literature, the researcher, among others, a 
7 ¢ vest ‘ : ee - : r 
for further in ere but competently executed studies, design of the study, sampl 
concentrate UPOT able defined, extraneous variables ng 
methods, population sample, varlab) Ntrolleg 


recommendations for further research, etc. | — 

4. Hypotheses: The research proposal should ecco ye eae —— to be tag, 
Some minor hypotheses, If any, should also be aime : ~~ cine hypothesic = 
tentative answer to a question, it Is important that the sl esis § “ formulated 
data are gathered. In fact, the formulation of hypothesis is suc a step Ww : : arifies the nature of 
the problem and also the underlying logic of the research investigat ion. The hypothesis shila be 
reasonable, consistent with known facts In the concerned area, testable and such that it can ‘ 
stated in the simplest possible terms. 


before 


5. Methods: This part of the research proposal is very important. It includes three 
subsections—subjects, procedures and data analysis. The subjects subsection spells out the 
details of the population from which subjects are to be selected. The total number of subjects 
desired from the population and how they will be selected are generally indicated in thig 
subsection. The procedure subsection outlines the details of the research plan. In other words, 
this subsection outlines in detail what will be done, how it will be done, what data will be needed 
and what data-gathering devices will satisfactorily be used. The data-analysis subsection outlines 
the details of the method of analyzing data by different statistical techniques. The details should 
preferably mention the rationale behind selecting the statistical techniques. 


6. Time schedule: An effective research proposal must have a clear time schedule in 
which the entire subject should be divided into manageable parts and probable dates should be 
assigned for their completion. Such steps help the investigator in budgeting his time and energy 
effectively, systematizing the study and minimizing the tendency to delay the completion. 


7. Expected results: A good research proposal should also indicate the possible or 
expected results as far as possible, although in some cases it may prove to be a Herculean task for 
the investigator to spell out the expected results. The expected results section should include a 
brief discussion of the anticipated results of the research and should also highlight those that are 
the most important for the research. In this section reasonable alternative to the expected results 


should also be mentioned as well as those likely problems should also be spelled out which may 
originate if the results show deviation from the research hypothesis. 


8. References: The reference section should include the names of the authors along with 
the details of the publication of their research work. It should be more or less like that same 
section as it would be submitted with the final report. Sometimes it is just possible that the 
literature may have to be included in the ‘Discussion’ section of the final report that was not 
anticipated in the proposal. But this should be an exception and not the rule. 


9. Appendix: A research proposal ends with an appendix. An appendix should include a 
list of all materials that are to be used in the study. Among other things, it may include a copy of 


the test or scale used, list of stimulus materials and apparatuses, a copy of instructions to be given 
to the subjects, and so on. 


Thus we find that the research proposal has several steps before it reaches its completion. A 


good research proposer must keep all these steps in view at the time of writing a 
research proposal. 


Writing 4 Researe}, Rep 
a ay 


da Research Proposal 679 


-~ an example of any es 
raking 4 Fr Y Fesearch project outl 
research report. IN€ the 
2, Discuss fully the main points to be Considered in gurl 
* sea ag e of the i a MUNIN a tesea 
3, Discuss som MPorant considerations ike . esearch proposal, 
research report. “PU in mind while writing 
a 


M Review Questions 


‘inst 
IO? Steps in Writing 4 


% 


Objective Questions 





1. Introduction to Measurement 


1. Which is the correct meaning of measurement? 
(a) Measurement means to measure something correctly. 
(b) Measurement means assigning of numerals. 
(c) Measurement means assigning numerals according to some rules. 
(d) Measurement means assigning numerals arbitrarily. 
2. Which is NOT a property of measurement? 
(a) Measurement involves assigning numerals according to rules. 
(b) Measurement is concerned with some attributes or features. 
(c) In measurement, numerals are used to represent quantities of attribute. 
(d) In measurement, numerals are used to represent qualities of objects. 
3. Who invented the term mental test? . 
(a) Galton (b) Cattell 
(c) Goodenough (d) Pearson 
4. If a=pand b=q then a+ b=p+q. This postulate of measurement tells that 
(a) the relationship between a and b is asymmetrical. 
(b) in the process of adding, identical numbers may be substituted for each other 
without making change in the result. 
(c) inthe process of addition, the order of combinations of objects or numbers makes 
no difference. 
(d) the objects which are equal to the same object are also equal to one another. 
5. Which is the highest level of measurement? 
(a) Nominal measurement . (b) Ordinal measurement 
(c) Interval measurement (d) Ratio measurement 


6. In which of the following scales are all the three basic properties such as magnitude, 
equal interval and absolute ratio incorporated? 





(a) Nominal scale (b) Ordinal scale 
(c) Interval scale (d) Ratio scale 
7. Which can’t be incorporated as one of the functions of measurement? 
(a) Selection function (b) Classification function 
(c) Review function (d) Comparison function 


8. Which is not a true point of distinction between psychological measurement and 
physical measurement? | | 
(a) In physical measurement, the unit of measurement 1s not fixed and constant but in 
psychological measurement this is fixed and constant. 
(b) In physical measurement there is true zero point, whereas in psychological 
measurement there is an arbitrary zero point. | 
(c) Physical measurement is more accurate and predictable than psychological 


measurement. ach ces _ 
(d) Physical measurement is direct whereas psychological measurement is indirect. 


683 


Scanned with CamScanner 


684 Tests, Measurements and Research Methods in Behavioural Sciences 


9. Which is not a correct source of error in measurement? 


(a) Respondent (b) Measurer 
(c) Test situation (d) Presence of people 
10. Which is not a general problem of measurement? 
(a) Relativity of measurement (b) Incompleteness of measurement 
(c) Indirectness of measurement (d) Randomness in measurement 
Answers 


1... Ce) 2. (d) 3. (b) 4. (b) 5. (d) 6. (d) 7. (C) 
8. (a) 9 (dd) 10. (d) 


2. Test Construction 


1. Which is a correct meaning of psychological test? 
(a) Psychological test measures traits/abilities. 
(b) Psychological test is a standardized measure of sample of behaviour. 
(c) Psychological test is only a qualitative measure of behaviour. 
(d) Psychological test is only a quantitative measure of behaviour. 
2. Which is not a characteristic of good psychological test? 
(a) Objectivity (b) Reliability 
(c) Norms (d) Comprehensibility 
3. By standardization of test is meant 
(a) there must be a standard way of giving instructions for maintaining uniformity 
(b) test must have reliability, validity and norms 
(c) test must have uniformity in scoring 
(d) all of the above 
4. A test in which there is a generous time limit so that most examinees are able to attmept 
it, is known as 
(a) verbal test (b) speed test 
(c) power test (d) nonverbal test 
5. The major feature of nonverbal test is that 
(a) it contains pictorial items 
(b) it contains only manipulative items 
(c) it contains both pictorial as well as manipulative items 
(d) it contains only verbal items 
6. Which is not a characteristic of good psychological and educational test? 
(a) Such test is based upon unlimited sample of behaviour. 
(b) Such test usually provides scores or categories, which are subsequently interpreted. 
(c) Such test provides both quantitative and qualitative measurement. 
(d) Such tests are either norm-referenced or criterion-referenced. 


7. The problem part of the item in objective test is known as 


(a) stem (b) response option 
(c) response bias (d) question 
8. The other name of free-answer test is 
(a) objective test (b) essay test 
(c) speed test (d) power test 
9. Which is not a correct use of psychological test? 
(a) In classification (b) In determining ethical issues 
(c) In diagnosis (d) In self-knowledge 


Scanned with CamScanner 


Objective Questions 685 
10. Which can’t be considered as one of the limitations of a psychological test? 
(a) Psychological test represents an invasion of privacy. 


(b) Psychological test penalizes bright and creative examinees. 
(c) Psychological test creates anxiety. 


(d) Psychological test fails to establish a close contact with examinees 


Answers 


1. (b) 2; (a) 3. (d) 4. (c) 5. (a) 6. (a) 


7. (a) 
8. (b) 9. (b) 10. (d) 


3. Item Writing 


1. Who has said that items are the lowest common denominator of a test? 
(a) Bean (b) Anastasi 
(c) Nunnally (d) None of these 


. For assessing noninterpretative information, what type of item is most suited? 
(a) Multiple-choice item (b) Two-alternative time 


(c) Matching item (d) None of these 
3. For assessing attitude of one person, which type of item formats are frequently used? 
(a) Checklist (b) Q-sort 


(c) Likert format 
4. Identification items are also known as 
(a) supply-type item (b) selection-type item 
(c) completion-type item (d) none of these 
5. Which can’t be included as one of the best characteristic of item of the test? 
(a) Item should not contain ambiguity. 
(b) Item should be easy. 
(c) Item should not present any difficulty in reading. 
(d) Item should have discriminating power. 


(d) Category format 


6. Which one can’t be included in the limitations of the matching item? 
(a) Matching items do not measure real understanding of concepts. 
(b) Matching items provide opportunity to arrive at the correct answer through the 
process of elimination. 
(c) Matching items are not suited when the purpose is to measure different types of 
information. 


(d) Matching items require longer vision and depth of knowledge in arriving at the 
correct answer. 


. What is the major purpose of including negative and positive items in the test? 
(a) To maintain a balance 
(b) To promote practicality in the test 
(c) To check acquiescence response set 
(d) To enhance reliability of the item 

. For scoring objective items, which method is the most appropriate and correct? 
(a) Overlay-key method (b) Likert-key method 
(c) Marshall’s item response method (d) Anastasi-key method 

9. Double-barreled items are those item, which convey 

(a) two or more ideas at the same time 

(b) two or more attitudes at the same time 


Scanned with CamScanner 


686 Tests, Measurements and Research Methods in Behavioural Sciences 


10. 


(c) two or more emotions at the same time 

(d) two or more feelings at the same time 

An item of the test is best defined as a single task that 

(a) can be partially broken down into smaller units 

(b) can be wholly broken down into smaller units 

(c) can’t be broken down into any smaller units 

(d) can be stretched and broken down into two smaller units 


Answers 


Lc) 2. (b) 3. (c) 4. (b) 5. (b) 6. (d) 7. (c) 
8. (a) 9. (a) 10. (c) 


4. Item Analysis 


i. 


Which is not the correct purpose of item analysis? 

(a) It indicates effectiveness of distractors. 

(b) It provides information about index of difficulty only. 

(c) It provides Opportunity to make a particular item more functional. 
(d) It provides information about index of discrimination. 

The proportion giving correct answers to an item is called as 

(a) difficulty value of item 
(c) correct distractor (d) none of these 


When the proportion of Passing an item in the upper group is lower than the Proportion 
of passing the same item in the lower group, the test constructor tries to 

(a) bring modificaiton in the item (b) reject such item from inclusion 

(c) accept such item for final inclusion (d) do none of these 

Which is not an appropriate statistical technj 
index of the item? 

(a) Product-moment correlation (b) Biserial r 
(c) Phi-coefficient (d) Kendall’s tay 


(b) discrimination value of item 


que for determining the discrimination 


The difficulty of an item is influenced by several factors. Which one can’t be included in 
this list? 


(a) Ambiguity of the item 

(b) Past experience of examinee 

(c) Homogeneity among alternative responses 
(d) Dichotomy in item 


. The steepness or slope of the item characteristic curve gives information about 


(a) difficulty value of the item (b) discriminating power of the item 
(c) effectiveness of distractors of item (d) none of these 


. What is the other name of item response theory? 


(a) Latent trait theory (b) Latent-type theory 
(c) Latent content theory (d) None of these 
. When the item-total correlation is negative, the slope of item characteristic curve is 
(a) positive (b) negative 
(c) flat (d) not known 


. When the difficulty value of an item is 1.00 or 0 (zero), the discriminative value of the 


item will be 
(a) high (b) low 
(c) negligible (d) none of these 


Scanned with CamScanner 


- 





Objective Questions 687 


10, Which is not a correct statement? 


(a) Distractor affects discriminative ability of the item. 
(b) Distractor affects difficulty value of the item. 


(c) Distractor affects neither discriminative ability nor difficulty value of the item. 
(d) Distractor creates complexity in the meaningfulness of the item. 


Answers 


1. (b) 2. (a) 3. (b) 4. (d) 5. (d) 6. (b) 7. (a) 
8. (b) 9. (c) 10. (c) 


5. Reliability 


1. 


Reliability is defined as 

(a) self-correlation of the test 

(b) correlation of the test with some criteria 

(c) proportion of error variance in total variance 
(d) none of these 


_ Which one is true in case of a highly reliable test? 


(a) Such test yields an inconsistent result. 
(b) Such test yields a consistent result. 

(c) Such test produces higher error variance. 
(d) None of the above 


. Logically, reliability is defined as proportion of 


(a) true variance in total variance 

(b) error variance in total variance 

(c) zero variance in total variance 

(d) high error and true variance in total variance 


- Which does not assess the internal constistency reliability? 


(a) Test-retest reliability (b) Split-half reliability 
(c) K-R formulae (d) Cronbach’s alpha 


. When you have developed a test containing a multiple scored items and you are 


‘nterested in calculating internal consistency reliability, which method will you adopt? 
(a) Split-half method (b) Cronbach’s alpha 
(c) K-R formula (d) Rulon formula 


_ When reliability of half test is 0.70, the reliability of test after Spearman-Brown prophecy 


formula would be | 
(a) 0.90 (b) 0.75 
(c) 0.82 (d) 0.68 


_ If the difficulty values of all items of a test is the same, the most appropriate technique of 


calculating reliability of test would be 


(a) K-R 20 (b) K-R 21 
(c) Rulon formula (d) Flanagan formula 
. When the reliability coefficient of the test is 0.65, its index of reliability will be 
(a) 0.76 (b) 0.81 
(c) 0.92 (d) 0.93 


. If you are calculating reliability of difference score, then higher correlation between X 


and Y will produce 
(a) higher reliability of difference score 
(b) lower reliability of difference score 


Scanned with CamScanner 


688 Tests, Measurements and Research Methods in Behavioural Sciences 


10. 


EL. 


12. 


13. 


(c) moderate reliablity of difference score 

(d) none of these 

Which does not affect the reliability of the test? 

(a) Difficulty value of the item (b) Length of the test 

(c) Homogeneity of the test (d) Standard error of measurement 

When the reliability coefficient of a test is 1.00, the standard error of measurement 
will be 


(a) 1.00 (b) 0 (zero) 

(c) 0.50 (d) none of these 

What is the source of error variance in split-half reliability? 
(a) Item sampling (b) Time sampling 
(c) Double sampling (d) None of these 


Which one is equivalent of the average of all possible split-half reliability coefficients of 
the test? 


(a) Cronbach's alpha (b) Rulon formula 
(c) K-R formula (d) None of these 
Answers 


1. (a) 2. (b) 3.(a) . 4. (a) 5. (b) G. (&) 7. (b) 
8. (b) 9. (b) 10. (d) 11. (b) 12. (a) 13. (a) 


6. Validity 


i 


Which is not the correct property of validity of the test? 

(a) Validity is a relative term and therefore, a test is not generally valid. 
(b) Validity is not a fixed property of the test. , 

(c) Validity is a matter of degree. 

(d) Validity is the self-correlation of the test. 


_ A testing enterpreneur has developed a Big Toe Intelligence test. For assessing your 


intelligence, he measures your right big toe in centimeters, multiplies that figure by 100 
and adds your age. Then he tells about your IQ. Given your knowledge of testing, you 
know this test is 

(a) reliable but not valid 

(b) not reliable 

(c) valid ; 

(d) probably standardized for abnormal people and not for normal ones 


_ If any psychological or educational test yields the same results consistency but does not 


measure what it intends to measure, it is 


(a) valid but not reliable (b) reliable but not valid 

(c) neither reliable nor valid (d) both reliable and valid 
_ An achievement test must have 

(a) concurrent validity (b) content validity 

(c) construct validity (d) none of these 


A content validity requires that 

(a) the test items must have sampling validity only 
(b) the test items must have item validity only 

(c) both ‘a’ and ‘b’ should be there 

(d) neither ‘a’ nor ‘b’ is needed at all 


Scanned with CamScanner _ 


10. 


. If a newly constructed intelligence test is be; 


Objective Questions 689 


intel? ng correlated with an already standardized 
test of intelligence and if it yields high correlation fficient. ; acy StANCArCIZe 
(a) predictive validity ation Coefficient, it becomes evidence for 


ait (b) construct validity 
(c) concurrent validity (d) factorial validity 


. Which of the following does not affect validity of the test? 


(a) Length of the test 


(b) Sample heterogenei 
(c) Ambiguous direction ny 


(d) Standard error 


If the reliability of the test is 0.80 and the reliability of the criterion is 1.0, the maximum 
validity would be ee 

(a) 0.75 (b) 0.82 

(co) 1.23 (d) 0.89 


. When validation of a completed test is done on a new sample taken from the same 


population, the process is known as 
(a) cross-validation 


(b) cross standardization 
(c) cross-hetrogeneity 


(d) cross divergence 
When a test correlates with its expected referents, the process is known as ...... and 


when a test correlates poorly with measures with which it should not, the process is 
known as ...... , 


(a) convergent validity, construct validity 
(b) divergent validity, convergent validity 
(c) divergent validity, content validity 

(d) convergent validity, divergent validity 


Answers 


1. (d) 2. (a) 3. (b) 4. (b) 5. (c) 6. (c) 7. (d) 
8. (d) 9. (a) 10. (d) 


7. Norms and Test Scales 


1. 


By formulation of ......, we are able to standardize the use and standardization of a 
given test. 
(a) validity (b) reliability 
(c) norms (d) temporal stability 
. When a test score is interpreted by establishing an external standard, such test is 
known as 
(a) norm-referenced test (b) criterion-referenced test 
(c) construct validity (d) none of these 


. Suppose you are going to construct test norms for general intelligence of children 


belonging to the age range of 3 years to 16 years. In such a situation, the most 
appropriate norms would be 


(a) grade norms (b) age norms 

(c) percentile norms (d) none of these 
. Deviation IQ is an example of 

(a) T score (b) z score 

(c) stanine score (d) standard score 


_ Ifthe obtained score is 80, standard deviation is 10 and mean is 60, then z score will be 


(a) +3.0 (b) +2.0 
(c) +1.75 (d) +2.50 


Scanned with CamScanner 


690 Tests, Measurements and Research Methods in Behavioural Sciences 


6. 


10. 


Which is not a limitation of percentile norms? 
(a) Percentile and percentage score are confused. 
(b) In percentile norms, there remains inequality of units throughout the percentile 


scale. 
(c) Percentile norms indicate only the person’s relative position in the standardization 
sample. 


(d) Percentile norms do require a big sample. 

In stanine score the mean is ...... and the standard deviation is ...... 
(a) 5, 1.96 (b) 6,2 

(c) 5, 2.96 (d) 4, 2.96 


. Which of the following statements is correct? 


(a) Deviation IQ is an improvement over traditional IQ. 
(b) Deviation IQ is calculated by MA/CA x SD. 

(c) Deviation IQ is a standard score. 

(d) The concept of deviation IQ was proposed by Stern. 


When transformation of raw score in done in such a way that all the characteristics of 
original distribution of raw scores are retained without any change, this process is 
known as 

(a) linear transformation (b) normalized standard score 

(c) cumulative frequency distribution (d) none of these 

The standard about standard score is that 

(a) it has a fixed mean and floating standard deviation 

(b) it has a fixed mean and fixed standard deviation 

(c) it has fixed standard deviation but floating mean 

(d) it has both floating mean and floating standard deviation. 


Answers 


1. (c) 2. (a) 3. (b) 4. (d) 5. (b) 6. (d) 7. (a) 
8. (c) 9. (a) 10. (b) 


8. Response Sei in Test Scores 


ae 


The deviation set was proposed by 


(a) Berg (b) Cronbach 
(c) Anastasi (d) Freeman 
. When the testees or examinees give a very unusual or uncommon response, it is known as 
(a) cautiousness (b) deviation set 
(c) evasiveness (d) extremeness 


In the popular Rorschach test, which of the following response sets is commonly 
observed? 

(a) Faking-good (b) Faking-bad 

(c) Deviation set (d) Evasiveness 


. When an examinee deliberately paints a negative picture of the self on any test, this 


tendency is known as 


(a) social desirability tendency — (b) social undesirability tendency 
(c) social altruistic tendency (d) none of these 


. When the difficulty value of the items of a test is low such as 0.10 or 0.15, the examinee 


is most likely to exhibit 
(a) tendency to work speedly (b) semantics 
(c) tendency to guess (d) coutiousness 


Scanned with CamScanner 


10. 


Objective Questions 691 
When the testee is found to respond in some systematic wa 
of the response options in each item, he is definitely showing 


(a) deviation set (b) acquiescence set 
(c) faking-good (d) faking-bad 


y by choosing any one type 


. Which statement is not true? 


(a) Response sets widen the range of individual differences. 
(b) Response sets contribute to the error variance of the test. 


(c) Response sets are usually seen in those tests whose items are vague and 
unstructured. 


(d) Response sets show consistency. 

When two-alternative response option items are written, the test is likely to suffer from 
(a) acquiescence, evasiveness (b) evasiveness, extremeness 

(c) faking-good and acquiescence  (d) evasiveness and faking-bad 


. For controlling acquiescence fully, psychologists recommend 


(a) two-point scales of judgement 

(b) three-point scales of judgement 

(c) five-point to seven-point scales of judgement 
(d) none of these 


Researchers have revealed that social disability tendency is related to 
(a) need for social approval (b) need for social conforming 
(c) need for avoiding social criticism (d) all of these 


Answers 


1. (a) 2. (b) 3. (c) 4. (b) 5. tc) 6. (b) 7: @ 
8. (a) 9. (c) 10. (d) 


9, Measurement of Intelligence, Aptitude and Achievement 


1. 


Who said, “Intelligence is whatever intelligence test measures”. 


(a) Thurstone (b) Binet 

(c) Boring (d) Spearman 
Which of the following has not been included in Thurstone’s primary mental ability test? 
(a) Word fluency factor (b) Reasoning factor 

(c) Rote memory factor (d) Semantic factor 

A positive correlation between two or more than two functions indicates the presence of 
(a) g-factor (b) s-factor 

(c) rfactor (d) 2z-factor 


. Piaget has defined intelligence on the basis of 


(a) assimilation and approximation (b) assimilation and accommodation 
(c) accommodation and approximation (d) approximation and approval 


. An expert in the field of intelligence says that there are three different kinds of 


intelligence, distinguished by three factors: the content of one’s knowledge, the product 
that represents it and the operation that the person performs. This expert agrees with the 
viewpoints of 


(a) Spearman (b) Guilford 
(c) Goddard (d) Gardner 
. If a 10 years old has a mental age of 14 years, his IQ would be 
(a) 140 (b) 130 
(c) 120 (d) 100 


Scanned with CamScanner 


692 Tests, Measurements and Research Methods in Behavioural Sciences 


7. The largest percentage of people taking an IQ test will score in range from 


(a) 100 to 140 (b) 80 to 100 
(c) 90 to 109 (d) 60 to 90 

8. Who introduced the fifth edition of Stanford-Binet test (SB-5)? 
(a) Thorndike (b) Roid 
(c) Terman (d) Merrill 

9. Raven's progressive matrics (RPM) is an example of 
(a) verbal test (b) nonverbal test 
(c) performance test (d) culture-fair test 

10. The concept of mental age was proposed by 
(a) Stern (b) Binet 
(c) Thorndike (d) Guilford 

11. Test of creativity measures 
(a) divergent thinking (b) convergent thinking 
(c) both of them (d) none of these 

12. Which is not measured by General Aptitude Test Battery (GATB)? 
(a) Numerical aptitude (b) Clerical perception 
(c) Finger dexterity (d) Abstract reasoning 

Answers 


1. (c) 2. (d) 3. (a) 4. (b) 5. (b) 6. (a) 7. () 
8. (b) 9. (b) 10. (b) 11. @) 12. (d) 


10. Measurement of Personality 


1. The first personality inventory was developed in 1912 by 


(a) Woodworth (b) Guilford 
(c) Galton (d) Kaplan 
2. The famous Minnesota Multiphasic Personality Inventory (MMPD has been developed 
following 
(a) logical content strategy (b) criterion-group strategy 
(c) factor-analytic strategy (d) mixed strategy 
3. MMPI-2 contains 
(a) 550 items (b) 567 items 
(c) 560 items . (d) 576 items 
4. Which of the following has not been included in MMPI-2? 
(a) Hysteria (b) Paranoia 
(c) Depression (d) Phobia 
5. Cattell’s 16 PF inventory is based upon 
(a) factor-analytic strategy (b) logical content strategy 
(c) criterion-group strategy (d) mixed strategy 
6. Mohan has unconscious resentment towards his father. Which test might best detect this? 
(a) MMPI-2 (b) Rorschach test 
(c) TAT (d) Guilford-Zimmerman Temperment Survey 
7. In Catell’s 16 PF test, the symbol Q, indicates 
(a) imagination (b) impulsivity 
(c) self-sufficiency (d) dominance 


Scanned with CamScanner 


Objective Questions 693 
g. Edwards Personal Preference Schedule measures 
(a) various types of interests (b) various types of needs 
(c) various types of traits (d) various types of attitudes 
9, Which is not assessed by EPQ? 
(a) Psychoticism (b) Extraversion 
(c) Neuroticism (d) Hypochondriasis 


Researches have shown that antisocial personali iz0i F 
ty and schizoid ality 
high on ...... of EPQ. Fo id personality often score 


(a) P scale and E scale 
(c) N scale only 


10. 


(b) P scale and N scale 
(d) P scale only 
11. Likert scale of attitude measurement is one type of 

(a) nondisguised nonstructured test 


(b) nondisguised structured test 
(c) disguised structured test 


(d) disguised nonstructured test 
12. Which is not assessed by the NEO-Personality Inventory (NEO-PI-R)? 
(a) Openness to experience (b) Agreeableness 


(c) Neuroticism (d) Zestfulness 


Answers 


1. (a) 2. (b) 3. (b) 4. (d) 5. (a) 6. (c) 7. (c) 
8. (b) 9. (d) 10. (d) 11, ¢b) 12; Gd) 


Projective Techniques 


1. In modern time, first projective test was developed by 


(a) Galton (b) Frank 
(c) Rorschach (d) Morgan 
2. Rorschach test measures 
(a) nonintellectual traits (b) intellectual traits 
(c) both (a) and (b) (d) neither (a) nor (b) 
3. Comprehensive Scoring System (CSS) of Rorschach responses was developed by 
(a) Klopfer (b) Rapaport 
(c) Exner (d) Beck 


4. Which of the following can’t be included in the categories of classification of responses 
of Rorschach test? 


(a) Location (b) Determinants 
(c) Content (d) Context 
5. What is the popular response for card No. VI of Rorschach test? 
(a) Bat (b) Animal skin 
(c) Human figure (d) Fish 
6. What does S response in Rorschach test indicate? 
(a) Negativism (b) Ability to perceive and read clearly 
(c) Concentration (d) None of these 
7. In Holtzman Inkblot test, how many cards are there? 
(a) 40 cards (b) 30 cards 
(c) 45 cards (d) 55 cards 
8. In TAT, the maximum number of cards to be administered to any one person is 
(a) 20 (b) 19 
(c) 30 (d) 25 


Scanned with CamScanner 


694 Tests, Measurements and Research Methods in Behavioural Sciences 


9. The Children Apperception Test (CAT) is used for children 


(a) below 10 years (b) in between 10 to 12 years 

(c) above 12 years (d) in between 13 to 15 years 
10. Machover Draw-a-Person (D-A-P) test belongs to the category of 

(a) verbal techniques (b) expressive techniques 

(c) pictorial techniques (d) constructive techniques 
11. Which of the following is measured by TAT? 

(a) Attitudes (b) Needs 

(c) Prejudices (d) None of these 


12. Which is not a characteristic of projective tests? 
(a) Projective tests are based upon projective hypothesis. 
(b) Projective tests are relatively unstructured. 
(c) Projective tests usually employ a global approach. 
(d) Interpretation of responses of projective tests is influenced by nonpsychoanalytic 
thinking. 


Answers 
1. (a) 2. te) 3. (©) 4. (d) §.€b) 6. (a) 7. (c) 
8. (a) 9. (a) 10. (b) 11...<b) 12. (d) 


12. Techniques of Observation and Data Collection 


1. Which is not a characteristic of a good questionnaire? 
(a) The questionnaire should be concerned with general topics. 
(b) The questionnaire should be as far as possible short. 
(c) Directions and wordings of the questions should be simple. 
(d) Questions should be presented from general to specific responses. 
2. What is the correct major point of distinction between questionnaire and interview 
schedule? 
(a) Questionnaire should be a short one whereas an interview schedule should be a long one. 
(b) Questionnaires contain questions, which are read and answered by respondents 
themselves whereas interview schedules contain questions to be read by the 
researcher before respondents who accordingly respond after listening. 
(c) Questionnaires are prepared by the researcher whereas interview schedules are 
prepared by respondents. 
(d) Questionnaire takes more time as compared to interview schedule in collecting data 
3. The success of interview is dependent upon 


(a) accessibility (b) cognition 
(c) motivation (d) all of these 
4. The interview based upon preframed questions to be asked is technically known as 
(a) formal interview (b) informal interview 
(c) semi-formal interview (d) semi-informal interview 
5. Which is not a source of error in interview? 
(a) Direction of interview (b) Attitude of interviewer 
(c) Attitude of the interviewee (d) Lack of warmth in the situation of interview 


6. Which can’t be a purpose of content analysis? 
(a) To analyze the different types of errors in students work 
(b) To find out the relative importance of some topics as problems 
(c) To explain the possible causal factors related to some outcome or event 
(d) To analyze about the needs and conflicts of the writers or the person concerned 


Scanned with CamScanner 


Objective Questions 695 


7, In naturalistic observation, how is the problem of reactivity dealt with? 
(a) The observer becomes a member of the group. 


(b) Observers are kept hidden or the participants are habituated to the presence of 
observers. 


(c) Participants give informed consent. 

(d) Participants never know they are being observed. 

8. Which is not the correct point of difference between participant observation and 
nonparticipant observation? 

(a) In participant observation the observer actively participates in the activities of the 
group whereas in the nonparticipant observation, active participation by the 
observer does not take place. 

(b) Participant observation is usually unstructured whereas nonparticipant 
observation is usually structured. 

(c) In participant observation the identity of the observer is usually hidden but in 
nonparticipant observation, the observer is usually known to the persons being 
observed. 

(d) In participant observation, the number of participants is larger than the number of 
participants in nonparticipant observation. 

9. If you prefer to study the verbalized attitudes of the participants by using a comparative 
rating method, your ideal choice will be for 


(a) forced-choice scale (b) Q-sort technique 
(c) Likert scale (d) none of these 

10. Semantic differential scale mainly measures 
(a) denotative fact of meaning (b) connotative fact of meaning 
(c) association (d) none of these 

11. Which of the following can’t be properly studied by sociometric method? 
(a) Choices of the individuals (b) Communication pattern 
(c) Interaction pattern (d) Motivational pattern 


12. When the rater is predominantly influenced by a single favourable or unfavourable trait 
of the ratee and accordingly rates him, this constitutes the example of 


(a) error of leninency _ (b)_ halo error 
(c) logical error (d) error of severity 
Answers 


1. (a) 2. (d) 3. (d) 4. (a) 5. (d) 6. (d) 7. (b) 
8. (d) 9. (b) 10. (b) 11. (a) 12. (b) 


13. Scaling Techniques 


1. In psychophysical scale, the term PSE stands for 


(a) point of subjective equality (b) point of subject equality 
(c) point of suspect equality (d) point of stupid equality 
2. The error of habituation and the error of anticipation are associated with 
(a) method of average error (b) method of limits 
(c) method of constant stimuli (d) none of these 


3. The relationship between the size of the standard stimulus and the size of just noticeable 
difference (JND) is technically known as 
(a) Fechner’s law (b) Weber’ law 
(c) Steven’s power law (d) law of JND 


Scanned with CamScanner 


696 Tests, Measurements and Research Methods in Behavioural Sciences 


4. 


10. 


Which of the following can be used both as a method of psychophysical scaling as well 
as a method of psychological scaling? 

(a) Method of category scaling (b) Method of magnitude estimation 

(c) Method of paired comparison (d) None of these 


- In the method of equal-appearing interval, the scale value of the item is indicated by 
(a) median (b) standard deviation 
(c) quartile deviation (d) none of these 


. Which of the following can’t be considered as a disadvantage of method of equal 


appearing interval? 

(a) The judges don’t keep the intervals equal. 

(b) The attitude of the judges tends to influence the sorting of statements. 

(c) In this method there is no objective basis for selecting the most discriminating items. 
(d) The judges remain dissatisfied with sorting of all statements in only 11-point. 


. In which scaling technique are items selected on the basis of rigorous procedure of item 
analysis? 
(a) Method of summated ratings (b) Method of equal appearing interval 
(c) Cumulative scale (d) None of these 


According to Guttman if the coefficient of reproducibility is between 0.85 to 0.90, it is 
said that 

(a) quasi-cumulative scale is existing 

(b) a perfect cumulative scale is existing 

(c) a perfect social distance scale is existing 

(d) none of these 3 

Under which scaling method is there provision for 5-point response options? 

(a) Method of summated rating (b) Method of cumulative scaling 

(c) Method of successive categoring (d) Method of equal appearing interval 

Law of comparative judgement was formulated by 


(a) Likert (b) Thurstone 
(c) Stevens (d) Fechner 
Answers 


L@ 260 306 40 5@ 6@ #7@ 
8. (a) 9. (a) 10. (b) 


14. Sampling 


if 


Which is a correct statement? 

(a) Sample is a smaller representation of population. 

(b) Sample is a bigger representation of population. 

(c) Every sample contains many types of error. 

(d) Decision to sample is not influenced by the size of the population. 


2. When all elements in a given population have an equal and independent chance of 


being included in the sample, the process is referred to as 
(a) good sampling (b) random sampling 
(c) nonrandom sampling (d) none of these 


3. Which of the following is a correct statement? 


(a) Systematic sampling is one type of nonprobability sampling. 
(b) Systematic sampling does not help in making correct and scientific judgement. 


Scanned with CamScanner 


10. 


ii, 


12. 


Objective Questions 697 


(c) Systematic sampling is partially probability sampling and partially nonprobability 
sampling. 
(d) Systematic sampling takes into consideration all persons. 


Quota sampling greatly resembles 


(a) area sampling (b) stratified random sampling 
(c) purposive sampling (d) snowball sampling 

_ Difference between parameter and statistic is called as 
(a) sampling error (b) error of sampling distribution 
(c) error score (d) standard error of difference 

- List of items from which sampling is to be finally drawn is called as 
(a) sampling units (b) sampling list 
(c) sampling frame (d) none of these 


- Which is not a characteristic of a good sampling design? 


(a) The sample design must provide a sample, which would be a true representative of 
the population. 

(b) Sampling design must control all errors except sample error. 

(c) Sampling design must be usable in light of the funds available for the research 


study. 
(d) The sample design must be such that the results of the sample study can be applied 
in general. 
_ Another name of accidental or incidental sampling is 
(a) convenience sampling (b) purposive sampling 
(c) judgemental sampling (d) inferential sampling 


_ When sampling is drawn by selecting every th person from a predetermined list of 


individuals, the technique is called as 


(a) dense sampling (b) saturation sampling 

(c) snowball sampling (d) systematic sampling 

The numerical value based upon sample is called ...... whereas the numerical value 
based upon population is known 4s ...... 

(a) statistic, parameter (b) parameter, statistic 


(c) statistical error, parametric error (d) parametric error, statistical error 


What is the advantage of cluster sampling over simple random sampling and stratified 

sampling? 

(a) Cluster sampling allows the researcher to represent subgroups of the population 
accurately. 

(b) In cluster sampling, it is not essential to begin with a complete list of the population. 

(c) Cluster sampling is the only sampling that could be called as convenience sampling. 

(d) Cluster sampling improves probability sampling. 

If the population is extremely large, which of the following sampling becomes 

impractical? 


(a) Stratified random sampling (b) Random sampling 
(c) Cluster sampling (d) Snowball sampling 
Answers 


1. (a) 2. (b) 3. (c) 4. (b) 5. (a) 6. (c) 7. (b) 
8. (a) 9. (d) 10. (a) 11. (b) 12. (b) 


Scanned with CamScanner 


—_——s 


15. Social Scientific Research 
1. Which is not a relevant statement? 
(a) Research is always directed towards the solution of problem. 
(b) Research is characterized by systematic, objective and logical procedures. 
(c) Research is replicable. , 
(d) Research needs good communication skills on the part of the researchers. 
2. Compared with field research, which of the following is true about laboratory research? 
(a) Informed consent is difficult to be gained in laboratory research than in field 
research. 
(b) Laboratory research achieves greater mundane realism. 
(c) For achieving experimental realism, the study needs to be conducted in the 
laboratory. : 
(d) Laboratory research achieves a greater degree of control over the different 
conditions of the experiment. 

3. One study compares three-month and five-month-old children on a perceptual task. 
Another study compares four groups of children of age 3, 5, 7 and 9 months using the 
same task. The second study 
(a) is a good example of replication and extension 

(b) is redundant because there is no need to include the first two groups 
(c) is an example of diverging operation 
(d) has higher degree of experimental reality than the first one 
4. Which research has greater degree of internal validity? 
(a) Laboratory experiment research (b) Field experiment research 
(c) Field study research (d) Ex post facto research 

5. About a year prior to retirement, 150 people volunteer for a programme to reduce high 
blood pressure. Two years after retirement, all 150 people remain in the programme and 
their blood pressure was found to be down. It is probably difficult to interpret these data 
because of what type of threat to internal validity? 

(a) History (b) Instrumentation 
(c) Testing (d) Maturation 

6. In an experimental study of memory, the researcher hypothesizes that using visual 
imagery enhances recall of words. Some participants were told to create images and 
some were told to repeat each word two times. Since forming images takes longer than 
repeating, those in the imagery group are shown the words at a slower presentation rate 
(S seconds per word) than those in the repetition group (3 seconds per word). Which of 
the following is correct about this study? 

(a) It has internal validity but lacks external validity. 

(b) The rate of the presentation is the independent variable. 

(c) Presentation rate is confounded with instructional independent variable. 

(d) It is a good study with an instructional variable as the manipulated independent 
variable. 

7. For an experimental group in a programme to reduce anxiety, the mean pretest score is 
89 (maximum = 100) and the mean post-test score is significantly lowered, that is, 80. 
This change is probably due to 
(a) maturation (b) regression 
(c) programme effectiveness (d) any of the above (or some combination) 

8. Which is not a proper method of controlling threats to reliability and validity in research? 
(a) Reliable manipulation (b) Pilot studies 
(c) Defence against diffusion of treatment 

(d) Statistical analysis 


Scanned with CamScanner 


Objective Questions 699 


9, The major advantage of interview over written and phone surveys is that interview 


(a) is less expensive (b) avoids es 
‘ ' ; : problem of prejudice and bias 
(c) yields information in detail (d) never has sampling problem 


10. Which is true regarding case study as a method? 
(a) It is a correct and reasonable choice for dealing with rare individuals. 
(b) Experimenter bias is not likely to be a problem. 
(c) Case study allows to draw conclusion based on cause and effect. 
(d) Case study allows in-depth studies of individual persons. 
11. After the general election is over in the country, a social psychologist interviewed 400 
voters to study their attitudes towards the winning party. This becomes an example of 


(a) correlational study (b) field experiment 
(c) ex post facto research (d) voting behaviour experiment 
12. The common description, “I-wonder-what-would-happen-if-I-did-this” relates to 
(a) crucial experiment (b) confirmatory experiment 
(c) exploratory experiment (d) pilot experiment 
Answers 


1, (dD) 2. (d) 3. (a) 4. (a) 5. (a) 6. (c) 7. (da) 
8. (d) 9. (c) 10. (a) 11. (2 12. (c) 


16. Single-subject Experimental Research and Small N Research 


1. Why do we prefer an A-B-A-B design to an A-B-A design? 
(a) Because it evaluates the treatment effect twice rather than once 
(b) Because it includes a withdrawal of treatment 
(c) Because it compares contingent with noncontingent reinforcement 
(d) Because it is more parsimonious 

2. When is a multiple baseline design preferred over an A-B-A-B design? 
(a) When the purpose is to study more than a single individual 

(b) When withdrawing treatment is not feasible for some reason 

(c) When the purpose is to compare two different treatment strategies in the same 
participant 

(d) When the researcher feels that the target behaviour can’t be researched all at once 

3. Which one of the following arguments is not made in favour of using small N design? 

(a) When data for many subjects are averaged, the emerging picture does not reflect 
the behaviour of any single individual. 

(b) If the precise control of conditions yields orderly behaviour then emphasis should 
be given on controlling environmental conditions for single subject rather than on 
achieving statistical controls. 

(c) It is not possible to research on relatively rare psychological phenomenon except 
by using small N design. . 

(d) Because of individual differences, inferential statistics are needed to determine if 
the independent variable is responsible for some changes in the dependent 
variable. 

4. When the researcher wants to apply treatment in sequence to the same class of 
behaviour in different participants who are in the same environment, his most preferred 
design would be 
(a) reversal design (b) multiple-baseline design across subjects 
(c) multiple-baseline design across behaviours 
(d) withdrawal design 


Scanned with CamScanner 


700 Tests, Measurements and Research Methods in Behavioural Sciences 


> 


10. 


When a researcher wants to determine the status of the subject’s behaviour prior to the 
intervention, he will use 


(a) basephase (b) baseline 
(c) base design (d) multiple-baseline designs 
. Which is not a limitation of single N research? 
(a) Lack of external validity (b) Order effects 
(c) Baseline problems (d) Control of experimental situation 
. Which is not a type of withdrawal design? 
(a) A-B-A design (b) A-B-A-B design 
(c) A-B design (d) A,-B,, B,-B, design 
. Which is not a good data collection strategy in single-subject research? 
(a) Method of interval recording (b) Frequency measure 
(c) Method of common error (d) Response-specific measure 


. For mentally retarded children, two treatment programmes were started and the 


researcher wishes to evaluate these programmes. The task was to get the children make 
their beds properly each morning. One treatment programme used a token economy 
system in which points can be earned and they can be traded for their reinforcers. The 
second treatment programme relied on the use of staff attention of a reinforcer, in which 
the bed making is reinforced by giving the child extra attention. For evaluating the 
effectiveness of the strategies, which would be the best design? 

(a) Withdrawal design (b) A-A,-B-A,-B design 

(c) A-B-C design (d) Alternating treatment design 

For changing the self-stimulation behaviour of an autistic child, a reinforcement plan is 
put into effect in three different settings but is staggered so that contingencies are 
introduced in one setting at a time. Here the researcher is using 


(a) multiple-baseline design (b) reversal design 
(c) A-B-A design (d) A-B-A-B design 
Answers 


1. (a) 2. (b) 3. (a) 4. (b) 5. (b) 6D) 7. (d) 
8. (c) 9. (d) 10. (a) 


17. Historical Research 


1. Which is a correct statement? 


(a) Historical research is a philosophical investigation of the past events. 

(b) Historical research is the critical investigation of events and experiences of the past. 
(c) In historiography, the researcher pays attention to only internal criticism. 

(d) In historiography, the investigator pays attention to only external criticism. 


2. Which is not a correct statement? 


(a) Historiography is relevant because all the data used in social sciences are drawn from 
the recorded experiences of the past. 

(b) The thinking of all workers in the social sciences and education is determined by 
the historical circumstances of their lives. 

(c) All policies respecting human affairs in social sciences and humanities involve 
interpretation of the past. 

(d) ong scientists can’t interpret any of the behaviour without direct reference to the 

ast. 


Scanned with CamScanner 


Objective Questions ‘701 


3, Oral testimony is one of the 
(a) secondary source of data 
(c) convincing source of data 
4. Which is a correct statement? 
(a) Historical research is quantitative in nature. 
(b) Historical research is qualitative in nature. 
(c) Historical research is both quantitative and qualitative in nature. 
(d) Historical research is neither quantitative nor qualitative in nature. 
5. Which is an eyewitness account in historical research? 


(b) primary source of data 
(d) none of these 


(a) Relics or remain (b) Oral testimony 
(c) Documents_ (d) All of these 
Answers 


1. (b) 2. (d) 3.. (b) 4. (c) 5. (d) 


18. The Problem and the Hypothesis 


1. Which is not a characteristic of research problem? 
(a) A problem statement is usually written in an affirmative form. 
(b) A problem statement expresses the relationship between two or more than two 
variables. 
(c) A problem statement should be testable by empirical method. 
(d) A problem statement should avoid moral or ethical judgement. 
2. According to P V Young, which is not a source of locating a research problem? 
(a) Documentary source (b) Personal source 
(c) Library source (d) Mass-media source 
3. A problem is manifested in the following ways except 
(a) a noticeable gap in the results of investigations 
(b) contradictory results of investigations 
(c) isolated fact in the form of unexplained information 
(d) well-settled fact in the area 
4. ‘Is there a divinity that shapes our ends? is an example of 
(a) solvable problem (b) partly solvable problem 
(c) unsolvable problem (d) partly unsolvable problem 
5. Which is not a characteristic of good hypothesis? 
(a) Hypothesis should be tenable. 
(b) Hypothesis should be economical and parsimonious. 
(c) Hypothesis should be specific in scope. 
(d) Hypothesis should have logical unity. 
6. A researcher often experiences difficulties in formulating hypothesis. ‘Which of the 
following can’t be included in the list of difficulties? 
(a) Absence of knowledge 
(b) Lack of ability to utilize the knowledge of theoretical framework 
(c) Lack of knowledge about scientific research technique 
(d) Lack of proper training 
7. Which is not a correct statement? 
(a) A research hypothesis is derived from the researcher’s theory. 
(b) Null hypothesis is no-effect or negative hypothesis. 
(c) Research hypothesis is the reverse of alternative hypothesis. 
(d) A universal hypothesis is one in which the stated relationship holds good for all the 
levels or values of variable. 


Scanned with CamScanner 


702 Tests, Measurements and Research Methods in Behavioural Sciences 


8. Which is not a correct function of hypothesis? 

(a) Hypothesis suggests various theories. 

(b) Hypothesis tests various theories. 

(c) Hypothesis tends to describe social phenomena. 

(d) Hypothesis provides basis for evaluating supernatural phenomena. 
9. A hypothesis is generated from various sources except 

(a) from various opinions, experiences and observations 

(b) by retesting of hypothesis previously tested 

(c) from existing research itself 

(d) from observing the behaviours of others. 


10. A laboratory experimenter, in general, prefers to formulate a hypothesis called as 
(a) existential hypothesis (b) universal hypothesis 
(c) causal hypothesis (d) descriptive hypothesis 


Answers 


1. (a) 2. (d) 4. (Gd) 4. €c) 5. Ce) 6. (d) 7s (© 
8. (d) 9. (d) 10. (c) 


19. Reviewing the Literature 


1. Which can’t be a purpose of review of literature? 
(a) Avoidance of repetition 
(b) Determining relevance of hypothesis 
(c) Identifying variables relevant for research 
(d) Synthesis of prior works 


2. All of the following are the sources of the review except 


(a) books (b) journals 
(c) internet (d) television 

3. A literature, either published or unpublished, bearing no ISBN or ISSN is technically 
called as 


(a) white literature 


(b) grey literature 
(c) black literature 


(d) none of these 


4. An ideal index card for reviewing and abstracting generally contains 
(a) 150 words approximately (b) 160 words approximately 
(c) 200 words approximately (d) 250 words approximately 
5. In psychological research, a very popular psychological abstract was first published in 
(a) 1927 (b) 1937 
(c) 1910 (d) 1920 
Answers 


1. (b) 2. (d) 3. (b) 4. (a) 5. (a) 


20. Variables 


1. Ff the researcher wants to study the impact of the level of illumination upon reading 
ability of a person, the dependent variable here will be 
(a) reading ability 


(b) level of illumination 
(c) researcher himself 


(d) none of these 


Scanned with CamScanner 


Objective Questions 703 


2. The variable which is directly manipulated by 
(a) type-S independent variable 
(c) type-S dependent variable . 


the experimenter is known as 

(b) type-E independent variable 

(d) type-E dependent variable 

3. Sometimes the researcher manipulates a variable only for the sake of controlling its 
unwanted effect upon the experimental results. Such variable is known as 
(a) extraneous variable (b) irrelevant variable 
(c) controlled independent variable (d) none of these 


4, Members of the family constitute the example of 


(a) quantitative variable (b) categorical variable 

(c) continuous variable (d) none of these 
5. Which is not a correct method of controlling extraneous variable? 

(a) Randomization (b) Balancing 

(c) Constancy of conditions (d) Constancy of variables 
6. Which does not produce demand characteristics? 

(a) Experimenter (b) Subject 

(c) Measurement procedures (d) Technique of elimination 
7. A variable which is not manipulated but measured by the experimenter is known as 

(a) intervening variable (b) attribute variable 

(c) active variable (d) none of these 


8. When a researcher uses statistics properly and draws the appropriate conclusions from 
the statistical analysis, the process is named as 
(a) statistical validity (b) statistical conclusion validity 
(c) statistical answer validity (d) validity of inferential statistics 

9. Which is not the appropriate method of manipulating independent variables? 
(a) By manipulating social setting in which confederates are used 
(b) By manipulating likely information to be given to the subjects 
(c) By manipulating intervening variables 
(d) By manipulating experimenter’s role 

10. Which of the following is a continuous variable? 


(a) Reaction time (b) Religion 
(c) Race (d) Caste 
Answers 


1. (a) 2. (b) 3. (a) 4. (b) 5. (d) 6. (d) i. Ge 
8. (b) 9. (d) 10. (a) 


21. Research Design 


1. The major advantage of between-subjects (or groups) design over within-subjects (or 
group) design is that the former design 
(a) avoids the problem of equivalent groups 
(b) reduces the extent of error variance ‘between conditions 
(c) requires fewer subjects 
(d) avoids the problem of sequence effects 


2. When asymmetric transfer effect occurs 
(a) the matching must be used as true method for creating equivalent groups 
(b) complete counterbalancing must be used 
© partial rather than complete counterbalancing must be used 
(d) counterbalancing may not prove fruitful in eliminating the sequence effects 


Scanned with CamScanner 


704 Tests, Measurements and Research Methods in Behavioural Sciences 


5. 


10. 


As a technique of creating equivalent groups, the researcher prefers matching over 

random assignment 

(a) wherever a measurable extraneous variable is known to correlate wath the 
dependent variable 

(b) every time when a larger number of participants is available 

(c) whenever potential confounding effect exists 

(d) in all the above conditions 


. The experimenter expectancy effects 


(a) have never been replicated after Rosenthal’s research, so they are not real 
problems 

(b) can be easily reduced by automating the procedure as much as possible 

(c) are often found in research utilizing human subjects but not in researches utilizing 
animal subjects 

(d) does not occur if the participants are unaware of the hypothesis being tested 


. Block randomization is often used 


(a) for counterbalancing when the participants are tested more than once per 

condition in a within-subjects design 

(b) for achieving random assignment while ensuring that an equal number of 
participants are assigned to each group 

(c) none of the above : 

(d) for both a and b 

Which design is illustrated by the case study comparing average children and gifted 

children on social and emotional problem-solving task? 

(a) Independent groups (b) Matched group 

(c) Within-subjects, multilevel (d) Nonequivalent groups 


. What do all repeated-measures designs (single factor) have in common? 


(a) Every participant will be tested in each of the conditions of the study. 

(b) For creating equivalent groups, matching will be preferred. 

(c) They will always have a control group. 

(d) The participants will be tested in each condition of the study and will always be 
tested more than once per condition. 


. A researcher predicts that an extrovert will do better on a problem-solving task with an 


audience rather than when alone. Introverts, however, are expected to perform better in 
alone condition than with audience. Here the researcher is 

(a) using a mixed factorial design 

(b) expecting an interaction effect to occur 

(c) using two manipulated variables 

(d) trying to predict a main effect for the audience variable 


. A 2x2 mixed factorial design includes 


(a) one manipulated factor and one nonmanipulated factor | 

(b) a within-subjects variable and a repeated-measures variable 

(c) a between-subjects factor and a repeated-measures variable 

(d) a two-subject variable and a three-subject variable 

In a maze-learning experiment the experimenter took 60 rats, of which 30 rats were 
tested with the light-on condition and 30 rats were tested with light-off condition. Also 
within each of the two groups, 10 rats were given food immediately after they reached 
the goal, 10 others were reinforced 8 seconds after they reached the goal and the 
remaining 10 were fed 12 seconds after reaching the goal. What is your idea about this 
design? 

(a) It is 2x2 independent groups design. 


Scanned with CamScanner 


11. 


12. 


13s 


14. 


15. 


, 16. 


Objective Questions 705 


(b) It is2 x3 mixed design. 

(c) Six different conditions are being tested. 

(d) Both independent variables are subject factors. 

A2x3 x5 design has 

(a) dependent variables with two, three and five levels respectively 

(b) 30 different independent variables 

(c) 30 different conditions 

(d) 10 different independent variables 

A3 x3 mixed factorial design uses five participants in each cell of the design. How many 
participants are needed for this purpose? 

(a) 9 (b) 27 

(c) 45 (d) 15 

In correlational studies, the extraneous variables are not controlled. Consequently, a 
problem is created. Which of the following is that problem? 

(a) Range restriction problem 

(b) Third variable problem 

(c) Directionality problem 

(d) Problems relating to measurement of dependent value 
A cross-lagged panel technique is sometimes used to solve problems relating to 

(a) directionality (b) third variable 

(c) negative correlation (d) positive correlation 

In an unequivalent control group design 

(a) on the basis of results the researcher is able to evaluate the linear trends 

(b) itis highly essential that the scores of the both groups must be same on the pretest 
(c) the most important measure is usually the amount of change between pretest and 

posttest 
(d) threats to internal validity by selection and history can be easily ruled out because 
of the ‘control group 


Suppose the Bihar state wants to implement strictly the law relating to wearing helmets 

by riders on motorcycles because of a recorded number of head injuries. Then which of 

the following will be true? 

(a) Comparison with a similar state that does not have the law would help assess 
potential regression effects. 

(b) The number of head injuries is likely to decline the following year. 

(c) Evaluation calls for an interrupted time series design. 

(d) All of the above 


Answers 
1. (d) 2. (b) 2, (a) 4. (b) 5. (d) 66@ 7.@ 


8. (b) 9. (c) 10. (c) 11. (c) 12. (d) 13. (b) 14. (a) 
15. ©) 16. (d) 


22. Qualitative Research 


dL. 


Which is not true about qualitative research? 

(a) Qualitative research is based upon phenomenological paradigm. 

(b) Qualitative researchers use highly standardized instruments at the outset. 

(c) In qualitative research, most of the analysis is done with words. 

(d) In qualitative research, the researcher aims to gain a holistic view of the context 
under study. 


Scanned with CamScanner 


706 Tests, Measurements and Research Methods in Behavioural Sciences 


2. 


10. 


Which of the following questions is not answered by qualitative research? 
(a) What does the qualitative researcher actually do? 

(b) What does the researcher want to understand at the end of the study? 
(c) What are the contexts in which the study is done? 

(d) What rigorous research design should be adopted in the context? 


. Which method of sampling is generally preferred by a qualitative researcher? 


(a) Quota sampling (b) Purposive sampling 
(c) Snowball sampling (d) Cluster sampling 
. When a researcher explores and analyzes the data without prior hypothesis, it js 
called as 
(a) context sensitivity (b) inductive analysis 
(b) deductive analysis (d) hypothetical analysis 


In which of the following the researcher tries to answer the question: How do persons 
try to accomplish their goals through specific behaviours in specific environment? 


(a) . Ethnomethodology (b) Heuristic enquiry 

(c) System theory (d) Ecological psychology 

When the researcher tries to generate theory from data, the process is referred to as 
(a) ethnography (b) grounded theory 

(c) secondary theory (d) primary theory 

Which is not used by the qualitative researcher? 

(a) Grounded theory (b) Restrospective studies 

(c) Content analysis (d) Factor analysis 


Which is not a characteristic of case study? 

(a) A case is a bounded system. 

(b) Incase study, no attempt is made to preserve the wholeness and integrity of the case. 
(c) In case study, multiple data collection methods are used in naturalistic setting. 
(d) The case is case of something. 

When the researcher takes cases, groups or materials according to their relevance for the 
theory that is developed and on the background of what is already the state of 
knowledge after collecting and analyzing a certain number of cases, the procedure is 
known as 


(a) grounded theory (b) theoretical sampling 
(c) typical case sampling (d) sensitive case sampling 
Which is not a component of qualitative data analysis according to Miles and Huberman? 
(a) Data display (b) Data reduction 
(c) Drawing conclusion (d) Interpreting the results 
Answers 


1. ©) 2. (d) 3. (b) 4. (b) 5. (d) 6. (b) 7. (d) 
8. (b) 9. (b) 10. (d) 


23. Carrying out Statistical Analysis 


1. 


Which is not a characteristic of normal curve? 
(a) A normal curve is continuous. 

(b) A normal curve is asymptotic to the x-axis. 
(c) A normal curve is usually bimodal. 

(d) A normal curve is symmetrical. 


Scanned with CamScanner 


10. 


al: 


12. 


Objective Questions 707 


In normal curve at + 26, how many percentage of cases lie? 


(a) 95.44% (b) 95.00% 

(c) 84.44% (d) 68.26% 

When the researcher estimates parameters using single value, these estimates are 
known as 

(a) interval estimation 


(b) point estimation 
(c) confidence estimation 


(d) none of these 

When null hypothesis is false and the researcher fails to reject this hypothesis, this 
constitutes the example of 

(a) Type Il error (b) Type I error 

(c) Type ID error (d) none of these 


. Which one measures the degree to which the relation between two variables X and Y is 


one-directional or monotonic? 
(a) Pearson correlation (b) Kendall's partial correlation 
(c) Coefficient of contingency (d) Spearman correlation 


. What will happen when the researcher reduces alpha level? 


(a) The risk of Type I error is reduced. 
(b) The risk of Type II error is reduced. 
(c) The risk of Type I error is enhanced. 
(d) The risk of Type II error is enhanced. 


Which is not used as post hoc test? 
(a) Tukey’s HSD (b) Duncan multiple range test 
(c) Neuman—Keuls test (d) Chi-square test 


Which is not the correct assumptions of ANCOVA? 

(a) Treatment groups are selected at random from the same population. 

(b) X-scores (covariate) are affected by the treatment. 

(c) Within-group variances should be approximately equal. _ 

(d) The dependent variable that is under measurement should be normally distributed 
in the population. 

Which is the measure of absence of relationship between X and Y? 

(a) Coefficient of alienation (b) Coefficient of determination 

(c) Coefficient of distribution (d) Coefficient of linear regression 


In regression equation, the differences between a score and its expected value 
according to theoretical model is known as 


(a) third variable explanation (b) residual 
(c) standard error of estimate (d) none of these 
In regression analysis, the sum of residuals is always equal to 
(a) +2 (b) -2 
(c) +1 (d) 0 
In general regression equation, the slope of the regression line is indicated by 
(a) y—bx (b) y=at+ bx 
(c) only b (d) only a 
Answers 


1. (c) 2. (a) 3. (b) 4. (a) 5. (d) 6. (a) 7. (d) 
8. (b) 9. (a) 10. (b) 11. (d) 12. (c) 


Scanned with CamScanner 


708 


Tests, Measurements and Research Methods in Behavioural Sciences 


24. Writing a Research Report and a Research Proposal 


i. 


In writing a research report, what is placed immediately before introduction? 
(a) Methods of the study (b) Hypothesis 
(c) Abstract (dq) None of these 


. Which is a correct statement? 


(a) Bibliography and references are the same thing. 

(b) Bibliography contains a detailed description of the history of the researcher. 

(c) References contain the details of the research reports of only those authors whose 
work has been included. 

(d) According to APA, references are usually placed at the end of each elec of the 
research report. 


. In writing research report in the field of psychology and education, we usually follow 


the guidelines of 

(a) Publication Manual of APA (b) Chicago Manual of Style 

(c) A manual for writing of Term papers, theses and dissertations 

(d) The MLA Handbook for writers of research papers. 

Which portion of the research report ensures that the researcher is familiar with what is 
already known and what is still unknown and to be verified and tested? 

(a) Introduction (b) Discussion 

(c) Review of literature (d) Conclusion 

Which section of the research report or research proposal is known as the ‘heart’ of the 
research work? 


(a) Introduction (b) Methods 
(c) Results (d) Discussion 
Answers 


1. (©) 2. (c) 3. (a) 4. (©) 5. (b) 


fs 


Scanned with CamScanner 


APPENDICES 


APPENDIX A: MAJOR EVENTS IN THE HISTORY OF MEASUREMENT AND 


PSYCHOLOGICAL TESTING 


206 BC Chinese start Civil Service Examinations. 


1838 
1862 
1866 


1879 


1879 
1885 
1890 


1896 
1901 


1904 


1904 
1905 
1908 
1912 


1916 
1917 


1917 
1920 
1920 


1921 


1924 
1926 
1926 


1927 
1931 
1935 


Jean Esquirol distinguishes between mental illness and mental retardation. 
Wilhelm Wundt uses a calibrated pendulum to measure the “speed of thought.” 


O Edouard Seguin writes the first major textbook on the assessment and treatment of 
mental retardation. 


Wundt founds the first experimental laboratory in psychology in Leipzig University, 
Germany, now known as Karl Marx University. 


Galton develops the first projective technique—a Word Association Test. 
Sir Francis Galton establishes first Mental Testing Centre in London. 
James McKeen Cattell uses the term mental test in announcing the agenda for his 


-Galtonian test battery. 


Emil Kraepelin provides the first comprehensive classification of mental disorders. 
Clark Wissler discovers that Cattellian “brass instruments” tests have no correlation with 
college grades. | 

Charles Spearman proposes that intelligence consists of a single general factor g and 
numerous specific factors s,, $,, $3, and so forth. 

Karl Pearson formulates the theory of correlation. 

Alfred Binet and Theodore Simon invent the first modern intelligence test. 

Henry H Goddard translates the Binet-Simon scales from French into English. 

Stern introduces the IQ, or intelligence quotient: the mental age divided by 
chronological age. 


Lewis Terman revises the Binet-Simon scales, publishes the Stanford-Binet; revisions 
appear in 1937, 1960 and 1986. 


Robert Yerkes spearheads the development of the Army Alpha and Beta examinations 
used for testing WWI recruits. | 


-Robert Woodworth develops the Personal Data Sheet, the first personality test. 


The Rorschach Inkblot Test is published. 


Culture Fair Intelligence Test, a nonverbal measure of fluid intelligence, is conceived by 
Cattell. 


Psychological Corporation—the first major test publisher—is founded by Cattell, 
Thorndike and Woodworth. 


Indian Psychological Association is founded 
Florence Goodenough publishes the Draw-A-Man Test. 


The first Scholastic Aptitude Test is published by the College Entrance Examination 
Board. 


The first edition of the Strong Vocational Interest Blank is published. 
Allport Vernon study of value is released. 


The Thematic Apperception Test is published by Morgan and Murray at Harvard 
University. 
711 


Scanned with CamScanner 


712 Tests, Measurements and Research Methods in Behavioural Sciences 


1936 
1938 
1938 
1938 
1938 
1938 
1939 


1939 
1943 
1947 


1948 
1949 


1949 
1949 


1950 
1951 


1952 


1953 
1955 


1956 


1957 
1957 


1958 
1959 
1963 
1968 
1969 


1969 


Edgar Doll publishes the Vinel and Social Maturity Scale for assessment of adaptive 
behaviour in those with mental retardation. 

L L Thurstone proposes that intelligence consists of about seven group factors known as 
primary mental abilities. 

Raven publishes the Raven’s Progressive Matrices, a nonverbal test of inductive 
reasoning intended to measure Spearman’s g factor. 

Lauretta Bender publishes the Bender Visual Motor Gestalt Test, a design-copying test of 
visual-motor integration. 


_ Oscar Buros publishes the first Mental Measurements Yearbook. 


Arnold Gesell releases his scale of infant development. 

The Wechsler—Bellevue Intelligence Scale is published; revisions are published in 1955 
(WAIS), 1981 (WAIS-R), and 1997 (WAIS-III). 

The Kuder Preference Record, a forced-choice interest inventory, is published. 

The Minnesota Multiphasic Personality Inventory (MMPI) is published. 

The famous Differential Aptitude Test is released. Currently its fifth edition has been 
released in 1992. | 

Office of Strategic Services (OSS) uses situational techniques for selection of officers. 
The Wechsler Intelligence Scale for Children is published; revisions are published in 
1974 (WISC-R) and 1991 (WISC-III). 

Cattell’s 16 PF test is released. 

Psychological Research Wing of the Defence Science Organization of India comes into 
being. | 

The Rotter Incomplete Sentences Blank is published. 

Lee Cronbach introduces coefficient alpha as an index of reliability (internal 
consistency) for tests and scales. 

American Psychiatric Association publishes the Diagnostic and Statistical Manual 
(DSM-I). 

Stephenson develops the Q-technique for studying the self-concept arid other variables. 
National Institute of Mental Health and Neurosciences (NIMHANS) is established in 
Bangalore, India. | 

The Halstead-Reitan Test Battery begins to emerge as the premiere test battery In 
neuropsychology. 

C E Osgood describes the semantic differential. 

California Psychological Inventory (CPI) is developed to measure dimensions of 
Normal Personality. 

Lawrence Kohlberg publishes the first version of his Moral Judgement Scale; research 
with it expands until the mid—1980s. 

Campbell and Fiske publish a test validation approach known as the multitrait- 
multimethod matrix. 

Raymond Cattell proposes the theory of fluid and crystallized intelligences. 

American Psychiatric Association publishes DSM-II. 

Nancy Bayley publishes the Bayley Scales of Infant Development (BSID). The revised 
version (BSID-2) is published in 1993. 

Arthur Jensen proposes the genetic hypothesis of African American versus white IQ 
differences in the Harvard Educational Review. 


Scanned with CamScanner 


1971 


1972 
1974 


1974 
1978 


1985 
1985 


1985 


1987 
1988 
1989 


1989 
1989 


1989 
1992 


1994 
1994 
1999 


2000 
2000 
2003 
2003 


Appendices 713 
George Vaillant popularizes a hierarchy of 18 ego adaptive mechanisms and describes 
a methodology for their assessment. 
First Survey of Research in Psychology in India is published. 


Rudolf Moos begins publication of the Social Climate Scales to assess different 
environments. 


Friedman and Rosenman popularize the Type A coronary-prone behaviour pattern; 
their assessment is interview-based. 


Jane Mercer publishes SOMPA (System of Multicultural Pluralistic Assessment), a test 
battery designed to reduce cultural discrimination. 


NEO-PI was published by Costa and McCrae. Its revised edition is released in 1992. 


The American Psychological Association (APA), the American Educational Research 
Association (AERA) and the National Council on Measurement Education (NCME) 
jointly publish the influential Standards for Educational and Psychological Testing. Its 
version appears in 1999. 


Sparrow and others publish the Vineland Adaptive Behavior Scales, a revision of the 
pathbreaking 1936 Vineland Social Maturity Scale. 


American Psychiatric Association publishes DSM-III-R. 
Psychology in India (Three volumes) is released. 


The Lake Wobegon Effect is noted: Virtually all states of the union claim that their 
achievement levels are above average. 


National Academy of Psychology (NAOP), India, is founded. 


Luria-Nebraska Neuro Psychological Battery is released. Its third edition is released in 
1989. 


The Minnesota Multiphasic Personality Inventory-2 is published. 


American Psychological Association publishes a revised Ethical Principles of 
Psychologists and Code of Conduct (American Psychologist, December 1992). 


American Psychiatric Association publishes DSM-IV. 
Herrnstein and Murray revive the race and IQ heritability debate in The Bell Curve. 


APA and other groups publish revised Standards for Educational and Psychological 
Testing. 


DSM IV (TR) is published. 
Psychology in India Revisited (Three volumes) is released. 
Fifth Edition of Stanford-Binet Intelligence test (SB5) is released. 


New revision of APA Ethical Principles of Psychologists and code of conduct goes into 
effect. 


2009-11 Psychology in India (Four volumes) is released. 


2013 


American Psychiatric Association Publishes DSM-V. 


Scanned with CamScanner 





(GLOSSARY 


type of small N design in which a baseline period (that is, A) is followed § 
a i DWE Ty ad 


_g design A ! 
P eatment period (that Is, B). 

A design: A type of small N design 
a period in W 


in which baseline period (A) is followed by a treatment 
hich the treatment is either reversed or withdra iin 


B a Sd 
A period (B), followed by 
(second : 

-ounterbalancing: A technique for controlling practice effects in the complete 
within-subject® design, In which conditions in one sequence are presented and then followed 
by the opposite of the same sequence. 

4-B-C-B design: A small N design that makes comparison of contingent reinforcement (B) with 
noncontingem reinforcement (C) and subsequently, the researcher is allowed to separate the 
d contingency: 


,forcers an 
| axis (or x-axis) in a graph. 
ble to detect a physical 


effects of rel! 
point at which the subject or observer is a 


Abscissa: The horizonta 
ute threshold: The 


Absol 
stimulus about 50 per cent of the time. 
accidental samples: A. type of nonprobability sample which is generally low in 
representativenes® and Is obtained when easy availability and willingness to respond are the 
major -onsiderations !n selecting the respondents. 
Acquiescence: Tendency to endorse OF agree with test items. 
Adjusting for: Same as pa rtialling out. 
after adjusting for ora partialling oul the effect of a covariate 


Adjusted mean: Ameen of a group 
in ANCOVA. 

Age differentiation: “ 
capabilities than younger children. 

Age norms: A type of norms that displays 
group in the normative sample. 

est in which ite 


discrimination based on the fact that older children poss€s> greater 


the level of test performance for each separate age 


ms are grouped according to age level (for example, Binet 


Age scale: A scale or | 
test), 
Alpha level: The significance level; probability of committing type | error, 
): The researcher's hypothesis about the outcome of a study. 
ternate- paliahil; aoe ee , 
rnc _— reliability: A type Ol reliability in which alternate form 
ee ped, given to a group Of su bjects and then they are correlated. 
6 oa gman (ANCOVA): A method 0 
al con : : it 
aie for removing the effect of an extraneous variable 
ir me spo variable and tends to analyze the port 
Variables, riable explained by one or more independent variables e 
\nalysi 
ianc | _ , 
rom ihe He gah A statistical test appropriate for analyzing interval data obtained 
analyzing total i and within-subject experiment designs. The F test is computed tor 
WO or more oe lance into different components to answer empirical questions like whether 
ps in an experiment significantly differ or not. 


732 


Alternative hypothesis (H, 
s of the same test are 


ysis based on ANOVA that uses 
_ assumed or known to be 
‘on of the variance in the 
xcluding the extraneous 


f statistical anal 


754 Tests, Measy rements and Research Methods in Behavioural Sciences 
A posterior comparison: Comparison not planned 


in advance for investigating Specific 
hypothesis concerning parameters of the population. 


Applied research: Such a research seeks knowledge that will improve the present situation 
because here, abstract principles are applied in real-world settings. 

A priori or planned comparison: Comparison planned in advance to investigate specific 
hypothesis relating to parameters of the population. 

Archival data: A type of data that is based upon records or documents concerning individuals, 
institutions, governments and other groups; commonly used as an alternative or in 
conjunction with other research techniques. 


Archival research: A descriptive method of research in which already existing records are 
examined to test some research hypothesis. 


Assumption: Conditions that must be met before certain conclusions are drawn. 


Asymmetric measure of association: A measure of the one-way effect of one variable upon 
another. 


Asymmetrical transfer: Such a differential transfer occurs in the repeated-treatments design 
where transfer from one condition to a second is different from the second to the first 
condition. 

Attrition: A threat to internal validity of test that occurs when participants fail to complete a study 
usually found in longitudinal studies. Those finishing the study may not be equivalent to those 
who started it. 

Balanced Latin square: A counterbalancing strategy in which each condition is preceded and 
followed often equally by every other condition. 

Balancing: A method of controlling extraneous variables by assuring that they affect the members 
of the experimental and control groups equally, 

Basal age: In Stanford-Binet scale, the highest year level at which the subjects successfully passes 
all tests. 

Basal level: A level for tests in which subtest items are ranked from easiest to toughest and below 
which the examinee would almost certainly answer all questions correctly. 

Baseline: In small N design, the initial stage in which the behaviour to be changed is monitored to 
determine its normal rate of response. 

Baseline stage: The first stage of a single N experiment in which a record of the individual's 
behaviour is made prior to any treatment or intervention. 

Basement effect: A problem relating to measurement, whereby the researcher Cannot measure 
the effects of an independent variable or its possible interaction because performance has 
reached minimum and is made prior to any treatment or intervention. 

Basic research: A basic research is one in which abstract principles are identified for seeking 
knowledge about nature simply for the sake of understanding it better. 

Between-group variance: A measure of variability among groups in an experiment. 

Behaviour checklist: List of behaviours with predefined operational definitions that the 
investigators are trained for making use in any observational study. 

Beta: A concept in signal-detection theory that refers to the observer's overall willingness to 
report that a signal has occurred. 


Between-group design: An experimental design in which differen t groups of subjects serve 
under different treatment conditions of the study. Also known as between-subjects design, 


Glossary 755 
_ construct validity: A type "7 bias that occurs when a test is shown to assess diff 
ae shetical traits (psychological constructs) for one group than another or tO assess th ak 
a with different degree of accuracy. e same 
rat = ; 
- contest ¥ alidity: A type of bias that occurs when an item or 
as in 


: Subtest is relatively more 
| sificult for the individuals of one group than another, although the ively more 
| 


8eneral ability level of th 
wo eroups iS held constant. Y - 
szved sample: A sample in which the major characteristics are systematically different from 
hose of the parent population. 


simodal distribution: A distribution of scores having two modes which need not have exactly the 
ame frequency, and each clearly larger than any of others. 


Bivariate analysis: A statistical analysis that investigates the relationship between two variables. 


sivariate prediction: A prediction of scores on one variable based on scores of one other 
variable. 


flock randomization: A popular technique for carrying out random assignment in a random- 
groups design; here each block includes a random order of conditions and in each condition 
ot the experiment, there are as many blocks as there are subjects. 


Carryover effect: A relatively permanent effect that subjects in one condition have on their later 
behaviour in another condition. 


“ase study: Intensive investigation of a particular individual or case; it does not allow inferences 
of cause and effect and is merely descriptive in nature. 


Category scaling: A technique of scaling in which the subject provides a rating for each stimulus 


nthe scale with a fixed number of categories for indicating the absolute amount of quality 
cing Measured. 


Ceilin 7 | | 
ae Occurs when scores on two (or more) conditions are at or near the — 
le for the scale, giving the impression that no differences exist between these conditions. 


Ceili . } d 
mg level: A level for tests in which subtest items are ranked from easiest to toughest an 


ADOVeE wh: a d fro ‘i 
Centi} * Which the examinee would almost certainly fail all remaining questions. 


Central ions that order scores by separating them into 100 equal parts. _ 
| = rt * ' 7 nc 
Mion, eee Descriptive measure indicating the centre of a distribution of scores; | 
Central “Median and mode. 

fal limit 

Means) of theorem: A mathematical principle that states tat ! a eel 

Normal of "Cores taken at random from any distribution of individua 
= te ell-shaped cues. 

Ist: 
st A Measure 
‘ servation, 


tribution of sums (or 
at the distribution 
\| tend to form 4 
» sjtuation 
jour in the situa 
used to record the presence or absence of some behav! 


- a} data usually obtained with 
Ween. Statistical test appropriate for analyzing nominal data usualy 
Clagg: Sroup designs. | testing which assumes 
; i 5 
an she of Measurement: A dominant theory in psychologics a _— 
wo “ted score consists of a true score plus error score OF measu 
Cluct {Uestion 
er 


: . are rovide 
St Questionnaire items in which response alternatives ” sac 
— | tie . = a u 27 © . 
, *Beregatey ‘A type of probability sample in which the sampling | 
“ting. ©Ments for which a sampling frame Is available. vantare identified an 
Step in data collection in which units of behaviour or eve”™ 
ding to specific criteria. 


luster (or a0 


"ND initi- 
“lassifieg ay 


1 


756 fests, Measurements and Research Methods In Behantoural Sciences 


Coefficient of alienation (Kt Measures the absence of relationship between X and Y and IS 
calculated by VJ—r?. 

Coefficient alpha: An index of reliability which is taken as the mean of all possible split-half 
coenicients, corrected by Spearman-Brown prophecy formula. 

Coefficient of determination (F): Ratio of the explained variation to the total variation, 

Coefficient of multiple determination (RA measure of proportion of variation in the dependent 
variable that is explained jointly by two or more independent variables. 

Cohort design: A design consisting of a longitudinal study of several frOUDS. 

Cohort effect: A situation which is produced when age differences are confounded by 
differences in subject history. 

Complete counterbalancing: Occurs when all possible orders of conditions are used in a study 
based on within-subjects design. 

Complete within-subjects design: A type of within-subjects design in which practice effects are 
controlled or balanced by administering the conditions several times to each subject so that 
the obtained results for each subject become interpretable. 

Complex design: An experiment in which two or more independent variables are manipulated 
and studied simultaneously. 

Componential Intelligence: The internal mental mechanisms in Sternberg’s theory of intelligence 
that are responsible for intelligent behaviour. 

Compromise design: See quasi-experimental design. 

Concurrent validity: A type of criterion-related valid ity in which the criterion scores are obtained 
at approximately the same time as the test scores. 

Confederate: A person offering his services to a researcher and who is instructed to behave in a 
certain way so that certain experimental treatment may be produced. 

Confirmation: A process of subjecting a statement to empirical test. 

Contirmatory experiment: An experiment in which a hypothesis is subjected to a test and 
contirmed, 

Confounding: Simultaneous variation of an extraneous variable with an independent variable so 
that any effect on the dependent variable cannot be attributed with certainty to be the effect of 
the independent variable. 

Constant: A term in a mathematical formula that does not vary with changes in conditions. 

Constancy of condition: A technique of controlling extraneous variables by keeping them at a 
constant value for all conditions throughout the experiment. 

Construct validity: A type of validity that indicates appropriateness of test-based interferences 
about the underlying construct supposedly measured by the test. 

Content analysis: A technique for making inferences on the basis of objective and systematic 
identification of specific characteristics of messages, 

Contextual intelligence: In Sternberg’s theory, those mental activities which are involved in 
adaptation to real-world environments relevant to one’s life. 

Contingency table: A two-way table constructed for classifying data, with the major objective of 
determining whether the two directions of classifications are dependent (or contingent) upon 
one another. 

Continuous variable: A variable that may assume any fraction and may change by any amount. 


Control group: That group which does not receive the treatment In an experiment. 


sion: Astandard used to assess the Validity of atest 
i sas : 
sont | variable: A potential independen Var 

0 


lidity: A type of validi able that i held conga 
convergent ae ariy te ldity Ntin an; 


in Which ad te an | . 
= . a St , Nest a 
ests with which it shares an Overlap of construct - Correlates highly i Ration. 


With which ; oth 
correction for guessing: Practice of revising a subject! ich it should Corre 


— ; =) final sc ae 
correlation: Astatistic showing the extent Vien light of 


to which two var 
a4. | 0 Variable 
-qsual relationship; the magnitude of a Correlation can vary ony rs i 
7 _ 3 mt 55, 
counterbalancing: A technique used to " 


lying 
0 +1,00. one 

experiment SO that the distribution of 
iatigue) is not confounded with the Conditions of the 


Criterion-referenced test: A test where the purpose 
with reterence to a defined educational 


Sh 


apparent BUESsing. 


; weres 1, Dractice a, 
experiment. Practice ane 


is to determine where 


nets the examinee stand. 

objective. No COMparison is m a, 

ade to the perf | 

of other examinees. performance 

Critical value of a test statistic: Value or values that separate the rejection and acceptance 
regions in the statistical test. 


Crossover interaction: A condition in which the effect 


of one independent variable on 2 
dependent variable reverses at different levels of a second independent variable. 


Cross-sectional studies: These are studies in which a large sample of population of various ages 
are taken at one time and tested (opposite of longitudinal studies). 


Ctoss-sequential design: A research design that integrates both cross-sectional and longitudina 
methods. 


imultaneously testec 
Crucial experiment: An experiment in which all possible hypotheses are simu 


‘0 that the true hypothesis may easily be ascertained. 's theory, which have been 
Crystallized Intelligence: Includes abilities in Cattell and Horn's 


“atned through the use of fluid intelligence. of cultura 
Culture-fair test: A test which minimizes the irrelevant ete learning. 
bias thereby produces a separation of natural ability from sp 


during an ex 
om : f responses 
mulative record: A record that displays the total number of resp 
Ptlod as a function of time. 
Sbtained a 
re i is, the sco 
‘ence interval: A region of scores (that is, common < 
999, i likely to include true population mean. 


| and social learning 


periments) 


| yoxsures at 
ich the criterion mea 


= p* F h 

| dity In W ) 

alidity: A type of criterion-related = ee are obtained: 
7 

t approximately the same time as the ves between an uppe 


emgage are => 
nfidence inte vals 
O | 


and lower valve 


Chas | 
-Onf 6% «confide 
0 ‘dence limit: The two endpoints of a confide 


. bul 
direst!) © 


nce interval. ; 
| _ 
Saag thal cal aA 
on ae thirst) winatis 
iMerrec hypothetical state or factor tet iy each con" 
; ‘ * hy 
Ed from certain behaviour (drinking be owing frequent! 
ho 
able 5M 


el 
be obsernh®s 


n i 
BeNcy table: A two-dimensional | 
“ROries 


Ts athe hel 
‘eh the Ody 
; . ‘ ia ; F nich , attic 
“Ony of wo nominal variables. ity sample | ents oF EN 
Chien , ot abi | y a bi aire! 
Ce sample: ye Of NoNprove aneral red 
Pens frome ple: A Se le who meet the Be" 
3sFOup of peop . 
relat BOU peo| 


Ont 
{ at 


a 
caus la! 
: hs J}. 
' if - : 
ion f meas 4 { | a ft sf . 
anothe ould be il 
1 ERE” 


lig Or attenuation: Refers to an 

atten a “Orrelation between a test te 

Nave 'On is used to estimate what the 
“en Perfectly reliable. 


|atiol 


758 = Tesis, Meastirements and Research Methods in Behatioural Sciences 


Correlation matrix: A table that tends to summarize a 


series of correlations aMONg severa| 
variables. 


Covariate: In ANCOVA, a variable that is controlled for or partialed out or held constant. 


Criterion validity: A form of validity in which a psychological test is 


) able to predict Some future 
behaviour or is related meaningfully to some other measure. ; 


Criterion variable: In regression analysis, the variable that is being predicted from the 


Predictor 
Variable. 


Cross-lagged panel correlation: A type of correlational research for dealing with the 
directionality problem. Here if X and Yare measured at two different times and if X precedes y 
then X might cause Yand Y cannot cause X. | 


Cross-validation: In psychometrices the process of evaluating the validity of a test by 
administering it to an independent new sample drawn from the same Population as the 
sample. Test validity is computed on a different sample of persons from that on which items 
were selected. In regression analysis, the measurement of predictor variables and comparison 
of predicted and observed values of the dependent variable on a fresh sample drawn from the 
same population as sample from which the regression equation was derived. 


Curvilinear correlation: Any association between two variables other than a linear correlation, 
d’: A statistic in signal-detection theory that relates to the sensitivity of the observer, 
Data: The scores obtained on the measure of the dependent variable. 


Data reduction: A process in analysis of behavioural data whereby the results are meani ngfully 
organized and important statements about findings are summarized. 


Debriefing: A procedure by which the researchers inform subjects about all aspects of a study or 


research after they have participated in it so that any negative consequences of the procedure 
may be removed. 


Deception: A strategy in research in which the participants are not told of all the details of the 
experiment at its outset; generally used for the purpose of avoiding demand characteristics. 
Deciles: Scales that divide the frequency distribution into equal tenths. 


Degree of freedom: The number of values that are free to Vary, assuming that the sum of values 
and the number of values are fixed. 


Dehoaxing: That part of the debriefing in which the true purpose of the study is explained to the 
participants. 

Demand characteristics: Cues within the research context or the situations of the experiment that 
guide the subject's behaviour. 


Dependent sample: Two or more groups for which the selection of the members of one group 
tends to determine the characters of the members of the other groups. 


Dependent variable: The variable measured and recorded by the experimenter in order to assess 
the effect of independent variables; it is hypothesized to be dependent on the value assumed 
by the independent variables. The dependent variable is also dependent upon extraneous 
variables, hence requires control procedures. 


Descriptive statistics: Statistical methods popularly used in organizing and summarizing data. 


Design of the experiment: A specific systematic plan for varying independent variables and 
noting the likely changes in the dependent variables. 


Dichotomous format: A test item format in which there are two alternatives for each item. 


Dichotomous scale: A nominal scale that consists of only two categories. 


A ie 


Glossary 759 


‘ference threshold: The increment or decrement in the differences of the two stimuli values 
“ which can be detected 50 per cent of the time by an observer. 
ral transfer: A problem in a within-subjects design when performance in one condition 
on which of the two conditions comes first. 


extent to which a test has different meanings for different groups of 
test may be a valid predictor of success In the concerned field for tribal 


pifferent ; 
differs depending UP 
ntial validity: The 
eople. For example, at 
people but not for nontribal people. 
Direct replication: A process ol repeating an experiment as closely as possible to test whether or 
irect rep! 


not the same results will be obtained. 

. ectionality problem: In correlation research directionality problem refers to the fact that for a 
Dire lation between X and Y, it is possible that X is causing Y but it is also likely that Y is 

— x. The correlation also does not provide any basis for deciding between these two 


alternatives. | 
piscordant pair: Two cases are ranked on the opposite order on two variables. 
Discrete variable: A variable in which each level represents a distinct category that is 
qualitatively different from another category—for example, male and female. 
piscriminant validity: A type of validity which results when a test does not correlate with 
variables or tests from which it differs. 


Dispersion: The extent of the spread of scatteredness in a distribution of scores. 


Distractor: An alternative on 
no credit is given for the same. 


a multiple-choice Item that is apparently similar but not correct and 


Distribution-free statistics: Statistical tests that make no assumptions about the form of the 


distribution of the sample. 

Divergent thinking: A kind of thinking that goes off In different directions. 

Double-blind technique: A technique In which both the subject and the observer, or the 
experimenter, are kept unaware (blind) of what treatment is being administered, 


Ecological validity: The extent to which an experimental situation can be generalized to 
behaviours occurring in natural settings ot day-to-day living. 

Effect size: Standardized measure of difference between population means. Effect size increases 
with great differences between means and tends to decrease with greater standard deviations 
but is not affected by same size. 


Element: Each number of the population of interest. 


Elimination: A [ . r i 
mM. ALEC hnique lor ¢ ontrollin , i : : e e | P 
: re Me effect ol extraneous variabl 5 bh removing them f 
: i \ 1 q =p : 1 rom 
the experime ntal Situation. , 


Empirical 

approach: ¢ , heel 

; ome - An approach which emphasizes the acquisition of knowledge from direct 
and experimentation tor answering questions 


Empirici 
ism: A Way >t Ht hapa 
y of getting knowledge that relies on direct observation or experience 


Equal-inte 

rval variable: Thos i 

that ic ha: : [hose variables | hij , 

what is being measured, bles in which the numbers stand for about equal amounts of 
Error of estimation: 


Distance b 
oe »tance between an estimate and the tr | | 
variance: The sum of all unco the true value of the parameter. 
n 


’ ependent variabl 
TOF Variation: No 


Che unos ne of variation that are likely to affect the 
Nsystemati ‘rstood as the denominator of the t or F test | 
Matic variation j | 
§ subjects Withinvcnach tis ni a 
Pp Inaran om-group desi 
en. 


COSTS 
A “3 ‘ + “s : 4 5 ee a Tel 
760 Tests, Measure ments and Research Methods in Behavioural Sciences 


Evaluati _ tl 
rei apprehension: A form of anxiety by the participants that leads them to behave so > ™ 
e evaluated positively by the experimenter. 

Evidence report: The summarized results of an empirical study which are commonly used jp 
confirmation of a hypothesis. 

Existential hypothesis: A conjectural statement which asserts that a relationship holds for at least 
One Case. 

Expectancy effect: Tendency for the results of the experiment to be influenced by what the 
experimenter expects to find. Also known as Rosenthal effect, named after a psychologist who 
has studied this problem extensively. 

Expectancy table: A table that shows the established relationship between test scores and 
expected outcome on a relevant task. 

Experimental control: Conditions in experiment in which extraneous variables are held constant 
so that any effect upon the dependent variable can be easily attributed to the manipulation of 
the independent variable. 

Experimental design: A design for obtaining and treating data in which the experimental method 
is used. 

Experimental error: See Error Variance. 

Experiential intelligence: In Sternberg’s theory, the ability to deal effectively with novel tasks, 

Experimental method: A method in which one or more independent variables are manipulated 
and responses on one or more dependent variables are measured. 

Experimental realism: The extent to which subjects are psychologically engaged in an 
experimental task such that they become less concerned with demand characteristics. 

Experimental units: Objects upon which measurements are taken. 

Experimenter bias: An effect that an experimenter may unknowingly exert on the results of an 
experiment. 

Ex post facto: Refers to conditions in an experiment where some manipulation has occurred 
naturally prior to the start of the experiment (literally it means from after the fact). 

Expected frequency: In chi-square test, the number of people in a category or cell expected if the 
null hypothesis is true. 

External validity: Extent to which the results of a study or a research can be generalized to 
different populations or settings. 

Extraneous variables: Variables which may potentially influence the results of a study but are not 
the variables whose effects are to be studied by the experimenter. 

Extravalidity concerns: Side effects and unintended consequences of testing. 

F test: A ratio between larger variance and smaller variance. 


Face validity: Not an evidence for validity; the extent to which items on a test appears to be 
meaningful and relevant. 


Factor: Independent experimental variables which are related to a dependent or response 


variable. In factor analysis, a group of variables that correlate maximally with each other and 
minimally with variable not in the group. 


Factor analysis: A set of multivariable data analysis methods for reducing larger matrices of 
correlations to fewer variables. 


Factorial design: A complex experimental design in which all possible combinations of the 
selected values of each independent variable are used 


Glossary 761 


loading: In fae tor analysis, Correlation of a variable with a factor. 
factor ! 


tor matrix: table of correlation between variables and factors. These correlations are ¢ alled 
Fac 0 i : ; 
ytor loadings: 


avs le | 
Incorrectly telling about the presence of a signal on a trial where only noise 


False alarm: 
occured, . . 
test-decision analysis, a case in which the test suggests a negative classification 


ive: ID 
at ne ative: spe * re eae ae ‘ F : ae Oe F 
sci lassification is positive; the subject is incorrectly predicted to fail on the 


vet the correct ¢ 
criterion. | . | 
Ise positive: in test decision analysis, a case in which the test suggests a positive classification 

fa the correct classification Is negative; the subject IS incorrectly predicted to succeed on the 
ve - ' 


criterion. | | 
Field experiment: An experiment in which the experimenter manipulates one or more 
Fi independent variables in their natural settings to see the impact upon the dependent variables. 


rield survey: A method of research in which subjects complete a questionnaire or interview ina 
natural setting. 

Filter questions: General questions framed in a survey to determine whether the respondent 
needs to be asked some specific questions later on. 

Fixed-eftects model: A statistical model in which all values of the independent variables about 


which inferences are to be drawn are specified in the design. 


Floor effects: An effect that results when a task is too difficult, causing most or all scores to the 
lowest possible score. 

Fluid intelligence: In Cattell & Horn’s theory, a largely nonverbal and culture-reduced form of 
mental efficiency. 

Forced-choice method: An item-writing method in which alternatives are matched for social 
disability and the raters are forced to choose between options that are equal in social 
desirability. 

Frequency: Number of scores or observations falling in a cell or classification category. 

Frequency distribution: A particular distribution in which a set of scores are arranged in 
ascending or descending order; the number of times each score occurs is also indicated. 

Friedman test: A nonparametric statistic appropriate for analyzing ordinal data obtained with a 
within-subjects design. 

Fundamental lexical hypothesis: In personality theory, the idea that trait terms have survived in 
language because they have conveyed important information about our dealing with others. 
Funnel questions: Questions dealing with a particular topic ina questionnaire, ordered from the 

Most general to the most specific. 

Gta : heat Pr ; 1 - 

He : >pearman s theory, a general factor of intelligence that must exist to account for the 
~etved Correlations between several tests: a general mental ability that is thought as a kind of 
mental energy, 


Cenerality of results: Whether or not 
different settin 
Go 


a particular experimental result will be obtained under 


re 6S, Such as using subjects from a different population. 
Subject role: A 


Ypothesis lor min 
Orm: A type of ' | 
Normative samp o! norm that displays the level of test performance for each grade in the 


= of participant bias in which participants try to guess the experimenter’s 
set) and then behave in a sucha way as to confirm it. 


Grand M 
“an (GM): | 
N ANOVA, the overall mean of all scores regardless of what group they are in. 


762 


Tests. Mee Thy # 3 : ; ‘ : 
easuirements anya Kesearch Methods in Behavioural Beton 


Group test: Mainly paper 


) -and-pencil test suitable fort 
time. 


esting large groups of Persons at the same 


Halo effect: Rater’s te 


ndency to rate a pe 
impression, 


rson high or low on all dimensions because of a global 


Hawthorne effect: Condition Where performance in 
subjects know that they are being observed. 

Heavy-tailed distribution: 
so that the histogram of 
tails, 


Heritability Index: A broa 
due to genetic factors. 


an experiment is affected because the 
A distribution that differs from anormal curve b 


trib y being too Spread out 
the distribution would tend to h 


ave foo many scores at each of the two 


d estimate that shows how much of the 

This index may vary from 0.0 to 2.0. 

High-order interaction: An 
experiments. 


total variance in a given trait is 
interaction effect involving more than two independent variables jn 


Histogram: A graphical method for describing a frequency distribution in which the height of the 
bars indicates the frequency of class intervals. 


History: A threat to internal validity that occurs when some historical event affect the 
Participants between the beginning of a study and its end. 

Hit: A correct detection of signal presented to the observer. 

Homogeneity of variance: 
study, 

Homogeneous test: A test in which the individual items tend to 
assessed by item-total correlations. 


Homoscedasticity: A statistical assum 
values is approximately equal. 


A statistical assumption showing equal variances among groups under 
measure the same thing: this is 


ption which asserts that the scatter of X values around Y 


Hypothesis: A testable statement of the relationship between variables that is advanced as a 
tentative solution to a problem. 


Incomplete within-subjects design: A type of within-subjects design in which each condition is 
presented to each subject only once, and the order of presenting the various conditions is 
varied across subjects such that practice effects are neutralized by pulling together the results 
of all subjects. 

Independent sample: Two or more groups for which the relation of 
does not have an impact upon the membership of the other group. 

Independent variable: The variable manipulated by 
determining whether it influences behaviour. 


the members of one group 


the experimenter for the purpose of 


Infactor analysis: A group of variables that correlate 


maximally with each other and minimally 
with variable not in the group. 


Inferential statistics: Statistical measures for testing whether differences in the dependent 
variables (or responses) that are commonly associated with various conditions of an 
experiment are reliable; such statistics are employed to make inferences about population. 

Informed consent: The notion that persons should be given sufficient infor 
so that their decision to participate as a research subject 


Instructional variable: Type of independent variable in 
of instructions about how to perform. 


mation about a study 
remain an informed one. 


which participants are given different sets 


Intelligence quotient: It is a ratio of the individual’s mental age (MA) 


as determined by the test to 
actual or chronological age (CA). Thus 1Q = MA/CA 100. 


Glossary 763 


. Acondition ‘a which the effect of one independent vari 
of the second independent variable. i variable differs depending upon 


dimensiona grap, met point on Y axis where X equal zero. In regressi 
he regression line intersects the Y axis. ERPS ON, 


interaction 
the level 

a twO- 

nt at which t 

y reliability: The extent to which items of a test asse 

lly measured by split-half reliability and Cronbach’s alpha ss a common 


intercept: cme 
this iS the pol 


ernal consistence 


characteristic, usua 

internal validity: The extent to which differences in dependent variables can be clearly attributed 
io an effect ol independent variables as opposed to the effect of extraneous variables. In other 
words, the extent to which any experiment or research study is methodologically soond and 
free from confounding. 

interobserver reliability: The extent [0 which two independent observers are In agreement with 
each other. 

interrupted time series design: A form of quasi-experimental design in which a treatment is 
evaluated by measuring performance several times prior to the institution of treatment ane! 


fter the treatment has been given. 
rval within which the population parameter ts 
le of an interval estimation. 


rank objects and on which the units 


several times 4 
rval estimation: Th 
dto lie. Con 
easurement 
t magnitudes of the 


e determination of an inte 
fidence interval is an exaMp 
scale that can be used to 
‘eflect equivalen property being measu red. 

A kind of bias shown by the interviewer in which he tries to adjust the wording 
to suit the respondent OF tries to record only the favourable position of the 


inte 
presume 


Interval scale: AM 


interviewer bias: 
of a question 
respondent's answers. 

Ipsative score: Such scores COMP 

provides his or her own frame of reference. 

which the average ol subscales is always the same for every examinee; for 

] aia 3 
subscales must be balanced by low scores on 


are the individual against him or herself and thus each person 


Ipsative test: A test in 
this, an individual examinee’s high score on 
other subscales. 


Item analysis: A set of methods used to assess item difficulty and item discriminability 


Iter isti . - | 
m characteristic curve (ICC): A curve prepared for each item and shows the total test score on 


the X-axis and proportion of test takers passing the item on the Y-axis 
Item difficulty: An | ) : : 
y: An index of item an alysis used to itt : 
| anal assess how difficult items are, the yrti 
passing an item. ams are, the proportion 





Item discriminati 
crimination: index ol | 
: An index of item analysis used to assess how well item separate groups who 


score high and low on the test. 
Item discrimination i 
1 discrimination index: A statistical | 
aan lex: A statistical index that shows how efficiently « | iscriml 
gore say : an item als ) 
_— agp who obtain high and low scores on the entire test / — 
ella ility index: | 7 | ; = 
correlation with saves OF 4 test item’s internal consistency as indexed by the 
Item response theory ( score and its variability as indexed by the standard deviation | 
ry (IRT): : | | 
post itiaduenic ee — of test construction in which the psychometricians 
latent trait t} skill or underlying trai hic vee 7 sc 
ge _ theory because here each HA bois which all test items rely; it is also known 
I l@Himeasciesl. Tha hanes nee is assumed to have a certain amount of the 
ry rests on the assumption that the probability of a 


response , 

and one se to a test item is th 

Or More ch: , s the produc : a 

ore characteristics of the ae on of one or more characteristics of the individual 


764 Tests. Measurements and Resear h Methods in Behari ward SEtONCeS 


Item-validity index: The product of a test items standard deviation and the point-biserjal 


correlation with the ¢ riterion. 
just-noticeable difference: See difference 


Kuder—Richardson formula 20: A formula for estimating th 
test. For use of this formula, all items must be scored either 0 or 1. 
istribution of scores devie normal curve having tails that are 


threshold. 
eo internal consistency reliability of a 


Kurtosis: Extent to which d ites from a 


too thick or too thin. 
ed conditions of the scientitic 


Laboratory research: A researc h that is conducted within controll 


laboratory. 
erbalancing occurs where each 


partial count 


Latin square design: A design in which form of | 
h sequential position, and each condition 


condition of the study occurs equally, often in eac 
precedes and follows each other condition exactly one time. 


Law of large numbers: As the size of the sample increases, the standard error of mean tends to 


decrease. 


Level of significance: Level against which the probability of an outcome under the null 


hypothesis is held to he statistically significant or not significant; also known as alpha (a) level. 

Light-tailed distribution: A type of distribution that tends to differ from a normal distribution 
curve by being too peaked or pinched so that a histogram of the distribution would have too 
few scores at each of the two tails. 

Likert scale: A scale constructed by Likert whi 
ordered on Agree/Disagree continuum. 

Linear relationship: Such relationship between 
variable is always followed by a constant increase or Cons 

Local norms: Norms based upon a representative local sample as opposed to a national! sample. 

Longitudinal study: A method of study in which individuals are studied overtime, and 
measurements are taken on the individuals at various time intervals. 

Main effect: The overall effect of an independent variable in a factorial design. 

Mann-Whitney U test: A nonparametric statistic for comparing two populations based on 
independent random samples from each. 

Marginal mean: In factorial design in ANOVA, mean score for all participants at a particular level 
of one of the variables. 

pase ae A type of design in which the experimenter forms comparable groups by 
al a i ie oe oe task and subsequently, randomly assigning the members of 

me " matched sets to the different conditions of the experiment. 

cCall’s T: A standardi cvet : = : 
en ee See eos system having a mean of 50 and standard deviation of 10. Itcan 
y ined from a simple liner transformation of z score, that is, T= 102+ 50. 


ch presents the examinee with five responses 


two variables in which an increase in one 
tant decrease in the other variable. 


Mean: A measure e | te ; 
salar of central tendency, the sum of all scores divided by their numbers’ MOT 
popularly known as arithmetic mean. 
Mean square: Sum of . _ 
: 2! squared deviations fro Vi | | 
| m the mean divided b ree of freedom. Alse 
known as variance estimate. ee ~ 
Measurement: A , f accion] 
7 i | a - ! a 


Measurement e | 
7 rror: Everyth ‘itis 
lest score or: Everything other than the true score that makes up an examinee s obtained 


Mechanical subj 
| al subject loss: A condit; : 
experiment eve _ A condition which occurs when a subject tails to: complete the 


Glossary 765 


ool appropriate for analyzing ordinal data obtained for a — 
s design. sane 
g the results of intelligence tests. 


atest: A statistical t 
” 4 petween-subject 
u 7 = 

atal age: A unit for expressin 
Men sic distribution which is neither peaked nor flat and, therefore, normal. So mesokurtic 

.; normal distribution. 7 
analysis: Statistical procedures for combining, and describing the results from different 
Meta # 

studies. | | | 

d of empirical keying: A method of scale development in which test items are selected 
yan entirely on how well they contrast with a criterion group taken from a normative 

a | 


sample. | | | 
thod of rational scaling: scale construction method in which all items of scale correlate 
“ ositively with each other and also with the total score for the scale. Also known as interna 


consistency approach. 


Mixed design: A complex desig 
variables in the experiment. 
entral tendency; that score which occurs most frequently 


n in which different designs are used for the different indepencer= 


= 


Mode: A measure of ¢ 
distribution. 


= = * a 


ionship: A relationship between two variables in which an increase in one 


Monotonic relat 3 . | 
d by a consistent increase or decrease in the other variable 


variable is always accompanie 
design: A single N experimental design in which the effect of a treatment 


Mutiple-baseline 
demonstrating that behaviours in more than one baseline change as 2 


displayed by 
consequence of t 
different individuals, for di 

Multiple correlation coefficient (R} A measure of linear correlation between a criterion or 
dependent variable and two or more predictor or independent variables, 

Multitrait-Multimethod matrix: A type of research design for assessing Convergent and 
discriminant validity in such a way that calls for the assessment of two or more traits by two or 
more methods. 

Multivariable analysis: A set of methods for data analysis that considers the relationship between 
combination of three or more variables. 

Multivariate analysis of covariance (MANCOVA): Analysis of covariance with more than one 
dependent variable, 

Multivari ; ; . 

; arlate analysis of variance (MANOVA): Analysis of variance with more than one 
€pendent variable. 


fferent behaviours in the same individual in different situations. 


Mundane real; 
‘mien extent to which the experiment mirrors real-life experiences. Generally 
ives red less Important than experimental realism. 
ura = To | * 
Broups design: A type of design in which the conditions tend to represent t 


levels of 

| . i j . ry 

Naturalsce: Naturally occurring independent variable like age variable (of the subject). 
“vin without any 


kind of inte 


Negative cor 
5C 


he selected 


servation: A type of observation made in more or less natu ral settings 

vention by the observer. 

OFes On — A particular type of relationship between two variables in which high 
80 with low scores on the other, lows with highs and mediums with mediums. 


Nom 
oF asurem . ; : 2 : ea ® 7 
differe categories, A measurement by which the subjects are placed into qualitatively 





he introduction of a treatment: in fact, multiple-baselines are established tor 


766 Tests, Measurements and Research Methods in Behavioural re 


Nondirectional hypothesis: A research hypothesis that does not predict a particular direction of 
difference between populations. 

Nonlinear effect: Any outcome that does not form a straight line when plotted on a graph. It 
occurs only when the independent variable has more than two levels. 

Nonparametric statistics: See distribution-free statistics. 

Nonparametric statistical test: A statistical test that makes no assumptions about population 
parameters; also known as distribution-free test. 

Nonprobability sampling: A type of sampling procedure in which there is no way to estimate the 
probability of each element in the population being included in the sample. 

Nonreactive measure: A measure of behaviour in which neither the subject is aware of being 
observed nor is his behaviour changed by the process of observation. 

Normal distribution: A symmetric, mesokurtic, bell-shaped form of distribution of scores. 

Normalized standard score: A score that is obtained by a transformation that renders a skewed 
distribution into a normal distribution. 

Normative sample: A group consisting of individuals who have been administered a test with the 
standard instructions, format and general procedures outlined in the manual for administering 
the test. Also called as standardization sample. 

Norm-referenced test: A test that assesses each person relative to a normative group. 

Norms: An average performance of the standardization sample on the test. 

Null hypothesis: An assumption used as the first step in statistical inference, whereby the 
independent variable is said to have no effect upon the dependent variable which is not 
influenced by the manipulation in the independent variable. 

Observer bias: Systematic errors occurring in observations due to the observer's expectations 
regarding the outcome of a study. 

One-shot, or cross-sectional study: A survey design in which one or more samples of the 
population are taken and information Is collected from the samples at one time. 

One-tailed test: A statistical procedure for testing the null hypothesis in which the entire rejection 
area is kept at one end of the sampling distribution. 

Open-ended question: A question that usually cannot be answered specifically and requires the 
examinee to produce something spontaneously, 


Operant level: See baseline stage. 


Operational definition: A definition 
for defining that concept, that is, definition of the concept in term 
example, intelligence is what the intelligence test measures. 


Order effects: The influence on a particular trial that arises from its position in the sequence ol 


of a concept in terms of operations that must be completed 
s of which it is measured. For 


trials. 
Ordinal scale: A scale that orders or ranks objects or events from most to least, or 
ng information on the exact distance between scale points or categorle>- 


least to most 


without providi 
Ordinate: The vertical axis (Y-axis) in a graph. 
Outlier: Score with an extreme (very high or very low) value in relation to other sc 
distribution. 


Parallel forms: [wo 
inter-item correlation. 


ores in the 


alternative forms of a test having equal means, equal variances and eq 


Parameter: A measure obtained from all possible observations in a population. 


; * ar j ‘ Gloss 
| correlation: A technique of correlation in which two vari b — os 
| dfla le 


partia ’ : s ‘ 
affect of the third variable is held constant. are Correlated while the 
= aan ing: Occurs 3 
artial counterbalanc ng: Occurs when subset of all possible orders of cond; 
within-subjects design. 5 OF Condition is used in 


partial replication: Repetition of a portion of some prior research 
bias: Occurs when the behaviours of the subjects or participants is inf 
MSs IS intiue 


participant 
beliefs about how they are supposed to behave in a study. 


percentile: Per 


nced by their 


centages of the individuals in the standardization 
: | Jara sam re | 
ore. It varies from 0 to 100. ple who scored below a 


specific raw sc 
percentile band: The range of percentiles that are likely to represent a subject’s true scc 
developed by forming an interval one standard error of measurement above and ee “ 
7 w the 


nd converting the resulting values into percentiles. 


A term used by Mischel to refer to the finding that the predictive validit 
rarely exceeds 0.30; an attack on trait concept of personality. ; 


obtained score 2 
personality coefficient: 
of personality scales 
Percentile rank: Proportion or percentage of scores that fall below a particular score. 
phenomenological report: Subject’s description of his or her own behaviour or conditions of 
mind ina study; also called subjective report. 
Pilot study: During initial stages of research, some data are collected and the problems spotted in 
these initial stages help the researchers in refining the procedures and prevent the full-scale 


study from being flawed from the point view of methodology. 
Placebo effect: Improvement in performance regardless of the nature of the treatment; this is 
because here a suggestion is made that participation in the research study is likely to result in 
something positive happening to the participant. 
Plagiarism: Taking the ideas of others deliberately and claiming them as one’s own. 
Point of subjective equality (PSE): That point which is perceived as psychologically equal to a 
specified standard on a physical scale. 
Point scale; A scale or test in which points such as 0, 
Polytomous format: Also known as polychotomous format, 
three or more alternative responses are given to each item. 
Population: A well-defined group of people, object, etc. 
a correlation: A relationship between two variables | 
, with high on other and low on other Is associated with lows on other. 
ost . -. ; 
ji hoc fallacy: Faulty causal inferences from the correlational data. 
ower test: | 
ir test: A test that allows enough time for the examinees to attempt all item 
ice The inf ' ie 
effects: The influence on performance which arises from practicing al 


dity in which the criterion me 
are obtained. 


he criterion variable. 
easured before and 


1 or 2 are assigned to each item. 
a format for objective test in which 


n which high on one Is associated 


5. 


ask. 


Predicti 

ive i a oe 

obtained i ‘are A type of criterion-related vali asures are 
Nn tuture, usually months or years after the test scores 


Predict , 
Or Vari : , 
able: In regression analysis, the variable used to predict t 


Pretest_p 
“°F Ost-test design: — . ; 
after a “aoe A popular research design in which subjects are ™ 


ilities; A group of seven factors of intelligence posited by Thurstone. 


Probabil 
ity: An estimat 
Probability a of the likelihood that a particular thing or event will occur. 
NCluded j, Pling: A sampling technique in which the probability of an element being 
hihe sample is specified. | 


768 Tests. Measurements and Research Methods in Bebanioural Sctences 


ntific inquiries. 
subjects design in which the accumulated 
such as fatigue). 


Problems: Research questions that initiate scie 


Progressive effect: Any sequential effect in within 
effects are assumed to be the same from trial to trial ( 

Projective hypothesis: A hypothesis which states that the personal responses towards the 
ambiguous stimuli retlect the unconscious needs, motives and conflicts of the individual. 


Projective test: A test in which the examinee Is given a vague, ambiguous stimuli and responds 


with his or her own Ideas or constructions. 

an and Brown that is used for correcting the 
thod is used. The method is also used to 
bring the test to a derived level of 


Prophecy formula: A formula developed by Spearm 
loss of reliability that occurs when the split-half me 
estimate how much the test length must be increased to 
reliability. 

Pseudo problems: Research questions that may 
examinations, it becomes clear that they can't be at 

Psychological abstracts: A monthly publication that summarizes th 
in different journals of psychology. 

Psychological testing: It refers to all the possible uses, applications and u 
psychological tests. 

Psychophysics: A study of quantitative 
resulting psychological sensation. 
Purposive manipulation: Systematic control of an independent variable during experimentation. 
Purposive sample: A type of nonprobability sample in which the elements to be included in the 
sample are selected by the researcher on the basis of special characteristics or typicalities of 

the respondents. 


P x £ factorial design: A factorial design in which at least one subject (P means person variable) 
and one manipulated variable (E means environmental variable) are found. 


appear to be real questions but on further 
tacked through empirical means. 
e studies recently published 


nderlying concepts of 


relationship between the physical stimulus and the 


Q technique: A technique for studying changes in self-concept or other variables by sorting 
statements into assigned categories. 
Qualitative factor: Factors that are not quantitative. 
Quantitative factor: Factors that may take on different values corresponding to the points on a 
real line. 
Quartile deviation (Q): The average of the differences between the ?rd and Ist quartile; divides 
the frequency distribution into equal fourths. 
ei. ii design: A research design that resembles a true experimental design but 
subjects are not randomly assigned to treatments nor are the treatments randomly assigned to 
the group; such a design is basically an attempt to stimulate the true experiment whose criteria 
are only partially met, that is, control over manipulation of independent variables is somehow 
achieved but control over assignment of subjects into equivalent groups is not achieved 
Random assignment: A procedure adopted in placing the subjects in groups or to order events 
such that only chance factors may determine the placement of ordering. 
Random-groups design: A type of between-subjects design in which random assignment of 
subjects to different conditions Is done. : 
Random sampling: A procedure of sampling in which each element of the population has an 
equal chance of being selected in the sample, and also selection or nonselection of one 
subject cannot influence the selection of the other. 


Th me F 
pele = 
_ 


Glossary 769 


4 measure of dispersion, 4 difference score obtained by subtracting the smallest score 


range he largest score in the distribution. 

fom He ' Soe Sucreui'tia 

fr jer scaling: A procedure of ranking in which the subjects are presented all the stimuli or 
gank-ore™ qneously with a request to rank them from highest to lowest or the vice versa 


abjects simult 


.ac: Atendency for rater’s ratings to be inaccurate because of leniency, severity and other 


er bi ; 
ss pes ot evaluation errors. 
y ee scale in which there are equal ‘ntervals between scale values and there is an 
tio sca 
psolute Z€r° | | | 
ore: AN elementary and basic level of information provided by a psychological test, for 
Raw “le the number of questions answered correctly. 
exd ie | , ; = 
-» measure: Measure of behaviour under circumstances in which the participants are aware 
Me et haviours are being observed or there is reason to believe that the observation or 


that their be 


measu rement proce 
Reactivity: /* special phenomenon that causes the reliability of scale to be higher when an 
eactivity: 


observer knows that his or her work is being monitored. 
n equation that describes the best-titting straight line for estimating the 


dures will influence the participant's behaviour. 


Regression equation: A 
criterion from the test. 


relevant variable: A variable that has been shown to influence the behaviour (or dependent 


variable) directly or indirectly. 

Reliability: The extent to which the same observations or scores are obtained in repeated studies; 
usually computed by a correlation between scores obtained by the same subjects on two 
forms of the test or scores on the same test af two different time intervals or scores obtained on 
each half of the test. 

Reliability coefficient: Ratio of true score variance to total variance of the test scores. 


Repeated-measures design: A design In which participants are tested in each of the 
experimenter’s conditions. Also known as within-subjects design. 

Replication: Repeating an experiment that occurs primarily when the results of some prior study 
are suspected to be erroneous. 

Representativeness: A sample is said to be representative if it has the same distribution of major 
characteristics as the population from which it was selected. 

Representative sample: A sample consisting of individuals with characteristics similar to those for 
whom the test is ultimately needed. 


Rese . ‘ F - 
ees hypothesis: The assertion that an independent variable will have an effect upon a 
ependent variable. 


Residual: The di 
he difference between predicted and observed values from a regression equation. 


Respondent mortali i 

lity: Condition that results w r h 
om a Its when a respondent fails to comp lete all phase: ota 
study or longitudinal research. " ne 


Response bi 
Nias: A | — | 
My Ti acouiemmesive ¢ tendency of the respondent to choose a particular response 
is supposed to ind ionnaire for an extraneous reason, not related to variable that the response 
licate but related to the content of the question. 


Respons 

€ style: A tend 

: e . ; . 

Content of it. ncy to mark a test item in a certain way, irrespective of meaning or 
Reversal design: A 


Resi is research ion | ‘| : ': 
Ondition and the tr design in which the experimenter alternates between the baseline 
€ treatment condition. 


a 
q 





770 Tests, Measurements and Research Methods in Behavioural Sciences 


Reverse counterbalancing: Occurs in a within-subects design when the subjects or participants 
are tested more than once per condition; they are given one sequence, then a second with the 
order reversed from the first such as ABBA or ABCCBA. 

Robust test: A statistical test from which statistical inferences are likely to be valid even when 
there are departures from normality in the population distribution. 

Rosenthal effect: See expectancy effect. 

Routing procedure: A procedure in which first items or subtests are administered for the purpose 
of determining the appropriate starting points for subsequent subtests. This procedure has 
been included in Stanford-Binet fifth revision. 

Sample: A group of persons selected from the population. 

Sampling distribution: A theoretical probability distribution of a statistic that would result from 
drawing all possible samples of a given size from the same population. 

Savant: A mentally deficient person but who shows a highly developed talent in a single area 
such as art, memory, music, etc. 

Scale: A set of numerals assigned to objects or events indicating the relative amounts of some 
characteristics that are possessed by those objects or events. 

Scaling techniques: Scientific procedures for measuring stimuli on psychological dimension. 

Scatter-plot (Scatter diagram): A group of scores made by the same individuals on two different 
variables, providing a pictorial representation of the degree of relationship between these two 
variables. 

Self-report questionnaire: A questionnaire that provides statements about an individual who is 

required to answer each statement as ‘True’ or ‘False’. 

Sematic differential scale: A rating technique in which the rater uses a seven-point scale to rate a 
concept on a number of bipolar adjectives such as good-bad, strong-weak, active-passive. 
Sequence effect: Occurs in a within-subjects design when experience of participating in one of 

the conditions of the study influences performance in subsequent or later conditions. 
Serendipity: An unexpected but more valuable finding than the original purpose of the research. 


Shrinkage: Sometimes a regression equation, created for one group, is used to predict the 
performance of another group of participants. This procedure tends to overestimate the 
magnitude of the relationship for the second group. The amount of decrease in the strength ot 
the relationship from the original sample to the sample with which the equation is used, is 
technically known as shrinkage. 

Significance: Significance has two basic dimensions—statistical significance and psychological 
significance. Statistical significance indicates whether the obtained results are a common or 
rate an event if only chance is operating. Psychological significance indicates qualities of data, 
adequacy of the data obtained and the clarity of the obtained results. 


Significance level: See level of significance. 

Simple main effect: The effect of one independent variable at one level of a second independent 
variable in a factorial design. | 

Single-factor design: Any experimental design with a single independent variable 

Single-factor multilevel design: Any design with a single ; | 

: | | | e independ re than 

two levels of the independent variable. : a 

Single-N experimental design (or Single-subject experimental design): An experimental 
procedure that focuses on behaviour changes in an individual by systematically observing and 
monitoring the individual’s behaviour. Y Y 


Glossary 771 


matic selection of the situations in which observations 


yndom or syste 
| of representativeness. 


ing: The Ie 
samp! WU) nce 
made for the p uring the pod 
variable: A YP ndent variable in which subjects encounter different 
situation O 


urpose of ens 


e of indepe 
r circumstances. 


situation 
a re to be 


situational 

environmen! 
d distributio 
scores are 
are ab 


al 
ical distribution; in a negatively skewed distribution, 


n: A nonsymmetr 
of the distribution and ina positively skewed distribution 


below the mean 
ove the mean of the distribution. 
ution; positive skewness shows that scores are piled 


a frequency distrib 
ness shows that the scores are piled up at the high end. 


h the respondent tends to answer in terms of what is 
han in terms of what they believe to be true. 


Skewe 
extreme 
ex { re me 5CO re 4 
of 


d negative skew 
n in whic 
ceptable rather t 
Tendency on the part of the examinee to respond to test items In 


Skewness- Asymmetry 
up at the low end an 
Social desirability: A conditio 
socially most desirable or ac 
ability response set: 
d desirable manner. 
people and to relate effectively to them. 


ental ability to understand other 
that control behaviour, less visible than surface traits but 


Social desir 
the perceive 
intelligence: AM 


Social 
d constant traits 


Source traits: Stable an } | 

more important In accounting for behaviour. 
n-Brown prophecy formula: A formul 
t the full length of a scale. 
hich has strict time limit, enough that few subjects will finish answering of al! 


tems of more or less uniform difficulty. 


ed in questionnal 
d for two (or more) 


a for adjusting split-half correlations so that they 


Spearma 
may point ou 

Speed test: A test W 
items; it contains 

Split-ballot technique: A technique us 
wordings of the same questions are use 
examine the effect of wording directly. 

ty: A method of estimating reliability in which test items are divided into two 

d subsequently, scores obtained on these two halves are correlated. 

X variable and Y variable (sometimes 


re construction In which different 
equivalent groups of respondents to 


Split-half reliabili 
equal halves an 

Spurious correlation: When evidence falsely indicates that 
more than two variables) are associated. 

Spurious relationship: Occurs when the zero-order 
dependent variable disappears or becomes signi 
control variable. 

— deviation: A measure of dispersion; square root of th 

; ach score from the mean divided by the number of scores. 

tanda 7 

ay . error: Standard deviation of the sampling distribution. 
ndard err “oti 
defined lew A statistical index for the accuracy of a regression equ 
erftKet tian andard deviation of the residuals from a regression analysis. Whe 

is small, prediction is considered most accurate. 


relationship between an independent and 
ficantly weaker with the introduction of 2 


e sum of squared deviations of 


ation. It Is 
n standard 


Standard 
urement: An index of measurement error that indicates the extent to 


examinee’s s ; 
s score might vary over a number of parallel tests. 


Standard s¢ 

score: A derj 

Mean in stan “a. e in which the original score is expressed 
m —- Ion units: th ; : 

, ©an and a fixed standard Prada standard about the standard score is th 
andardization fa nae 
Unfair when use 


as the distance trom the 
at it has a fixed 


llacy: Af | 

: A fallacious vi re ee: 

din oth ee MEN that a test is standardized on one sample is ipso facto 
€r population. 


S¥stem for assienj 
assigning the numbers 1 through 9 to a test score. 


Pi 
- fin ae ; 


LE a a eR et A Niche 
a + Sent Fm ‘ i 
er % F pee 


772 ~~ Tests, Measurements and Research Methods in Behavioural Sciences 


Statistic: A numerical value that is computed from 


- the observations of a sample taken from 
population. 


Statistical determinism: An assumption made by research psychologist 
can be predicted with a probability greater than chance. | 


Sten Scale: A 10-unit scale with five units above and five units below the mean. 


Stratified random sampling: A type of probability sampling in which a population is divided into 
various strata and random samples are taken from each of these strata. 


that behavioural events 


Structured Interview: An interview conducted under well-defined procedure in which questions 
or sequence of questions are predefined and predetermined. 


Subgroups norms: Norms derived from an identified subgroup as opposed to a national sample 
that is diversified. 
Subject: The organism or object on which observation or manipulation is done. 


Subject variable: A type of independent variable that is selected rather than manipulated by the 
researcher; it refers to already existing attribute of the individual chosen for the investigation 
such as age, gender. 


Successive independent samples: A research design in which several cross-sectional surveys are 
done and the same questions are asked from each succeeding sample of participants or 
respondents. 


Surface traits: Traits which are readily observable and are most obvious aspects of personality. 


Systematic measurement error: A type of measurement error that arises when, without the 
knowledge of test developer, a test consistently measures something other than the trait, 
which it intended to assess. 


Systematic observation: Observations done according to some explicit procedures, as well as in 
accordance with the logic of scientific inference. 


Systematic variance: Variability that can be attributed to some identificable source, either 
systematic variation of the independent variable or the uncontrolled variation of a 
confounding variable. 


Task variable: A type of independent variable in which participants are given different types of 
tasks to perform. 


Temporal validity: The extent to which the obtained experimental results Can be generalized to 
other time trames. 


Test battery: A group of tests, the scores of which are used together in assessing an individual. 

Testable statement: A statement that may be directly or indirectly subjected to empirical 
verification, 

Test-retest reliability: A method of estimating reliability in which the same test is given twice to 
the same group of heterogeneous subjects and the resulting scores are correlated. 

Theory: A statement of relationship between two or more than two variables. 


Third variable: A variable that may account for the observed relationship between two other 
variables. 


Threats to validity: Possible factors or causes which must be controlled so that a clear cause-and- 
effect inference can be drawn. 


Tied pair: Two cases are ranked similarly on one or both of two variables. 


Time sampling: Systematic or random selection of observation intervals for ensuring @ 
representative sample of behaviour. 





Glossary 773 


t-ratio: A test statistic for determining the significance of a difference between two means 
re: An individual's hypothetical real score on a test; it can be determined on probability 


d is never directly known. 


andard score with a mean of 50 and standard deviation of 10. 


True sco 
basis an 


T-score: A st 


two-tailed test: A statistica 
rejection areas are place 
‘a which the null hypothesis is rejected even when it is true. 


| procedure for testing the nondirectional null hypothesis in which the 
d at both the ends of the sampling distribution. 


Type | error: An error 


Type !{ error: An error in which the null hypothesis Is accepted even when it Is false. 


Unimodal distribution: Frequency 
frequency than any other. 
A hypothesis that asserts that the relationship for the given variables holds 


distribution with one value clearly having the larger 


Universal hypothesis: 
for all times and at all places. 


Unobtrusive measure: Any measure of behaviours that can be recorded without participants 
knowledge that their behaviour has been observed. 


Unstructured Interview: An interview conducted without any specific or particular questions OF 


sequences of questions. 

Validity: The extent to which the test measures what it intends to measure. 

Validity shrinkage: A discovery ‘n cross-validation research that a test predicts the relevant 
criterion less accurately with the new sample of examinees than with the original tryout 
sample. 

Value: A shared and enduring beliet about ideal mode 

Variable: Anything that can assume different: numer al values and can be measured or 


manipulated, 


< of behaviour or end states of existence. 


Variance: The square of the standard deviation; a measure oO! the average squared deviation of a 


set of scores from the mean score. 


Verification: A process ol determining whether a hypothesis 1s pr bably true or false. 


Volunteer bias: The bias that results from the tact that a given sample contains only those subjects 


who have shown their willingness to participate in the study. 
' _ + ‘ ‘ , . a 
— s law: The just-noticeable difterence between two stimuli which can be stated as the ratio 
etween the two stimuli that are independent of ther magnitude. 
Weighte ' 7 ) | ; 
= mean: The sum of the mean of each group multiplied by its respective weight (the Nin 
ach group), divided by sum of the weights (total N). 
Withdraw 
al design: oe | 4 . 
site Any small N design in which a treatment ts In place for a time and is then 
L oO J “¢ ; a j ; : 
vith etermine if the rate of behaviour comes to the baseline. 
ithin-group variance: A se —_—* ; 
nce: A measure of dispersion or variability among subjects in the same group. 


Within-subj 

e bisa — | eo 

one ley | Cts design: An experimental design in which each subject Is tested under more than 
r €! of independent variable. 


7 AS —e . ; 
is filled oui urvey method in which the investigator prepares a written questionnaire that 


by the partici 
— pants. 
y-axis: The vertical axis of the graph 


Prime: The 
of y that falls on the regression line above only x. It iS symbolized as y. 





774 Tests, Measurements and Research Methods in Behavioural Sciences 


Yoked control group: A control group in which the treatment given to a member of the control 
group is matched exactly with the treatment given to a member of the experimental group. 
Z-score: A standard score showing distance of each score from the mean of the distribution in 

terms of standard-deviation unit. 
Z-test: Hypothesis testing procedure in which there is a single sample and the population 
variance is known. 


i 


